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Abstract 


The ROBUS-2 Protocol Processor (RPP) is a custom-designed 
hardware component implementing the functionality of the ROBUS-2 
fault-tolerant communication system. The Reliable Optica I Bus (ROBUS) 
is the core communication system of the Scalable Processor-Independent 
Design for Enhanced Reliability (SPIDER), a general-purpose fault- 
tolerant integrated modular architecture currently under development at 
NASA Langley Research Center. ROBUS is a time-division multiple 
access (TDM A) broadcast communication system with medium access 
control by means of time -indexed communication schedule. ROBUS-2 is 
a developmental version of the ROBUS providing guaranteed fault- 
tolerant sendees to the attached processing elements (PEs), in the 
presence of a bounded number of faults. These sendees include message 
broadcast (Byzantine Agreement), dynamic communication schedule 
update, time reference (clock synchronization), and distributed diagnosis 
(group membership). ROBUS also features fauit-tolerant startup and 
restart capabilities. ROBUS-2 tolerates internal as well as PE faults, and 
incorporates a dynamic seif-reconfiguration capability driven by the 
interned diagnostic system. ROBUS consists of RPP s connected to each 
other by a lower-level physical communication network. The RPP has a 
pipelined architecture and the design is parameterized in the behavioral 
and structured domains. The design of the RPP enables the bus to 
achieve a PE-message throughput that approaches the available 
bandwidth at the physical layer. 
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Notation 


The following table lists the symbols used in this document. Many of the symbols have generic 
descriptions and are reused where applicable. Subscripts are used to distinguish different quantities 
labelled with the same main symbol. 


Symbol 

Description 

N 

• Total number of BIUs in the system 

• Range: Positive Integers; Unit: None 

M 

• Total number of RMUs in the system 

• Range: Positive Integesr; Unit: None 

n 

• Total number of BIUs trusted by a node 

• Range: Naturals; Unit: None 

m 

• Total number of RMUs trusted by a node 

• Range: Naturals; Unit: None 

£2 

• Total number of nodes of the opposite kind 

• Range: Positive Integers; Unit: None 

© 

• Total number of nodes of the same kind 

• Range: Positive Integers; Unit: None 

CO 

• Total number of nodes of the opposite kind trusted by a node 

• Range: Naturals; Unit: None 

0 

• Total number of nodes of the same kind trusted by a node 

• Range: Naturals; Unit: None 

IMP(x x , x 2 ) 

• IMP = Integer Mid-Point function ( = Rounded Average) 

• Integer closest to the mid-point between x x and x 2 

• Algorithm: 

Step 1 : x = (x, + x 2 )/2 

Step 2: IMP = round(x), with round(x) = LxJ if x < Vi, or Txl if x > Vi 

• Range: Integers; Unit: Same as xl and x2 

Lls 

• Link syndrome width; number of syndrome bits for each receiver 

• Range: Naturals; Unit: None 

L pf 

• Payload field width for ROBUS messages 

• Range: Positive Integers; Unit: None 

f 

• Clock frequency 

• Range: Reals; Unit: Hertz (Hz = Cycles per second) 

X 

• Granularity of a clock (i.e., duration of a clock tick) 

• Range: Positive Reals; Unit: Second 

t 

• Real time 

• Range: Reals; Unit: Nominal Tick 

T 

• Clock time 

• Range: Integers; Unit: Local Tick 

P 

• Bound on the drift rate a clock signal generated by a physical oscilaltor 

• Range: Reals; Unit: None 

k 

• Frequency division factor (= Frequency step-down factor from physical -oscillator frequency 
to local-clock frequency) 

• Range: Positive Integers; Unit: None or Tick/Tick 

Cx(T) 

• Earliest real time at which clock x reaches value T 

• Range: Reals; Unit: Nominal Tick 

n 

• Bound on the relative time skew of particular events among specified nodes 

• Range: Non-negetive Reals; Unit: Tick 


xv 




Symbol 

Description 

n 

• Bound on the observed relative time skew of particular events 

• Range: Positive Integers; Unit: Local Tick 

8 

• General real-time delay 

• Range: Reals; Unit: Nominal Tick 

A 

• General clock-time delay 

• Range: Integers; Unit: Local Tick 

d 

• Message delivery delay 

• Range: Non-negative Reals; Unit: Nominal Tick 

V 

• Uncertainty in message delivery delay 

• Range: Non-negative Reals; Unit: Nominal Tick 

r 

• Message reception delay 

• Range: Non-negative Reals; Unit: Nominal Tick 

e 

• Uncertainty in message reception delay (= Network imprecision) 

• Range: Non-negative Reals; Unit: Nominal Tick 

R 

• Expected reception delay 

• Range: Naturals; Unit: Local Tick 

l-l 

• Bound on error of expected reception delay 

• Range: Reals; Unit: Nominal Tick 

W 

• Window size 

• Range: Naturals; Unit: Local Tick 

c 

• Computation Process delay with respect to the end of the deskew window for time -driven 
protocols 

• Range: Naturals; Unit: Local Tick 

s 

• Send Process delay with respect to a reference time for time -driven protocols 

• Range: Naturals; Unit: Local Tick 

A 

• Accept function delay with respect to the time of reception of the selected event for 
synchronization protocols 

• Range: Naturals; Unit: Local Tick 

B 

• Send Process delay with respect to a reference event for synchronization protocols 

• Range: Naturals; Unit: Local Tick 

H 

• Reset delay with respect to a reference event 

• Range: Naturals; Unit: Local Tick 

X 

• Data Introduction Interval 

• Range: Reals; Unit: Nominal Tick 

A 

• Data Introduction Interval 

• Range: Positive Integers; Unit: Local Tick 

K 

• Number of messages in a stream 

• Range: Positive Integers; Unit: None 

P (uppercase) 

• Nominal resynchronization period 

• Range: Positive Integers; Unit: Local Tick 

p (lowercase) 

• Nominal resynchronization period 

• Range: Positive Reals; Unit: Nominal Tick 
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1. Introduction 


The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor- 
Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault-tolerant integrated 
modular architecture (IMA) currently under development at NASA Langley Research Center. The 
purpose of this effort is to produce a flexible architecture that can be configured to satisfy a wide range of 
performance and reliability requirements, while preserving a consistent interface to application programs. 
The architecture is expected to support functions of various criticality levels, including ultra-reliable and 
safety-critical aircraft functions with hard real-time deadlines. 

ROBUS is a time -division multiple access (TDMA) broadcast communication system with medium 
access control by means of a time -indexed communication schedule. ROBUS-2, a developmental version 
of ROBUS, provides the following guaranteed fault-tolerant services to the attached processing elements 
(PEs): message broadcast (Byzantine Agreement), dynamic communication schedule update, time 
reference (clock synchronization), and distributed diagnosis (group membership). The bus is tolerant to 
internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the 
internal diagnostic system. ROBUS-2 also features fault-tolerant startup and restart capabilities. 

ROBUS-2 is intended for laboratory experimentation and demonstration of the capability to 
reintegrate repaired nodes, dynamically update the communication schedule, and tolerate and recover 
from correlated transient faults. ROBUS-2 is also intended to demonstrate that the bus is an efficient 
communication system that can achieve a PE-message throughput that approaches the available 
bandwidth at the physical communication layer, while preserving the fault-tolerance guarantees. A 
thorough description of the high-level ROBUS-2 design can be found in [Torres 05]. 

ROBUS is a distributed system consisting of a set of dedicated ROBUS Protocol Processors (RPP) 
communicating over a lower-level physical communication network. For ROBUS-2, the RPPs are 
custom hardware -based components that implement the ROBUS functionality. The RPPs have simple 
strobe-in/strobe-out synchronous interfaces to the PEs and to the lower-level communication network, 
and they are capable of processing messages at a rate of one message per clock tick. The RPPs have a 
large array of error detectors for self-checks, checks of remotes nodes, and checks for the integrity of the 
bus. The RPPs can also read and diagnose error syndromes generated by the lower-level network. 

The RPPs are described in the VHDL (Very High Speed Integrated Circuit Hardware Description 
Language) language and are implemented on Field-Programmable Gate Arrays (FPGA). The VHDL 
code is highly parameterized in the behavioral (e.g., time to wait before asserting a signal) and structural 
(e.g., size of a buffer) domains in order to enable customization of the bus with respect to the targeted 
application. Most of the parameters must be specified before VHDL synthesis. Parameters that uniquely 
identify a particular RPP within the system (e.g., the assigned unique identification number) can be 
specified pre-synthesis or given post-synthesis as an input to the RPP. 

This document is organized as follows. First, an overview of the ROBUS-2 conceptual design is 
presented. Then, the abstract timing models used in the timing analyses are introduced. The timing 
analyses appear after that. Next, the RPP design requirements are enumerated. This is followed by a 
description of the RPP. The formulas for the behavioral and structural parameters are given after that, 
followed by a list of the variables that must be specified for a particular implementation. The appendix 
has the detailed specification for one of the components of the RPP. 
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2. Overview of the ROBUS-2 communication system 


This section presents a brief description of ROBUS-2, including the structure and operation of the 
system. ROBUS-2 is intended for implementations with a relatively small number of PEs, say, no more 
than seven. A more detailed description can be found in [Torres 05]. 


2.1.1. System structure 

Figure 2.1 shows the ROBUS topology. The bus has an active-star architecture with the Bus 
Interface Units (BIUs) serving as the bus access ports and the Redundancy Management Units 
(RMUs) providing connectivity as network hubs. The network between BIUs and RMUs forms a 
complete bipartite graph in which each node is directly connected to every node of the opposite kind. 
Only the links shown are available for communication. All the communication links are bidirectional. 



Figure 2.1: ROBUS topology 


Figure 2.2 depicts the basic structural components of a ROBUS node. This decomposition applies to 
BIUs and RMUs. The Communication Module handles all the point-to-point communication. The links 
between BIUs and RMUs can be either one-to-one or one-to-many links, as long as broadcast 
communication is supported. The nature of the links between BIUs and PEs depends on how they are 
physically related. If each BIU and its corresponding PE are in physically separate fault-containment 
regions (FCRs) (see the Redundancy management section), then they are interconnected by a one-to- 
one data communication link. If each PE-BIU pair share an FCR, then some other means of local data 
exchange can be used. 


The Computation Module, also known as the ROBUS Protocol Processor (RPP), handles all the 
ROBUS-specific functions including mode transition logic, low-level protocols, error detection, 
diagnosis, reconfiguration, and distributed coordination. The main difference between BIUs and RMUs is 
the functionality of their RPPs. 
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Figure 2.2: Generic node structure for BIUs and RMUs 


2.1.2. Distributed coordination 

Each ROBUS node is driven by an independent, free-running physical oscillator. These oscillators 
are characterized by a known bound on their drift rates with respect to real time. Each node also has a 
logical-time clock, referred to as the local-time clock, which keeps track of the passage of time as 
indicated by the physical oscillator. Given an initial precision of synchronization for the local times at 
any two nodes, the precision can worsen over time at a rate determined by the drift rates of the physical 
oscillators. 

There are two main categories of ROBUS protocols: synchronization and synchronous. The 
synchronization protocols use event-triggered communication and event-processing operations to 
generate high-precision distributed events that are used to synchronize the local-time clocks. The 
synchronous protocols process information using time -triggered communication and operations. To 
achieve proper coordinated action in the execution of the synchronous protocols, the local-time clocks of 
the participating nodes must be synchronized within some known bounded precision. 

The ROBUS has two synchronization states: synchronized and unsynchronized. In the synchronized 
state, the precision of synchronization is determined by an internal distributed reference event generated 
by a clock synchronization protocol. The precision of this event allows the nodes to achieve very tight 
local-time synchronization. The bus is in the unsynchronized state when it transitions to the startup and 
restart processes. The precision of synchronization in this state is mainly determined by events not 
directly controlled by the bus. It is assumed that the synchronization precision in this mode has a known 
bound that can be large relative to the precision in the synchronized state. The bus transitions from the 
unsynchronized state to the synchronized state after the execution of a synchronization protocol. Because 
the local times can drift apart, a synchronization protocol must be re-executed at regular intervals to 
ensure that the local times are kept synchronized. The rate of re-synchronization is constrained by 
physical parameters of the design (e.g., oscillator drift rates) as well as precision and accuracy goals. The 
fault-tolerance attribute of the synchronization protocols enables the bus to achieve and maintain 
synchronization even in the presence of failed nodes. 

The execution of synchronous protocols is driven by the local time and a time-indexed operation 
schedule. The low-level distributed protocols specify the node activities by defining the operations, the 
operation sequencing, the message flow patterns, and the executing nodes for each operation. The timing 
of the operations is determined using a model of distributed synchronous composition. This execution 
scheme and the high synchronization precision in the synchronized state make the steady-state behavior 
of the ROBUS highly deterministic as it precisely specifies the timing of all the internal communication 
between BIUs and RMUs, as well as the communication with the PEs. 
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2.1.3. Redundancy management 

The puipose of redundancy management is to increase the probability of continued service delivery 
through effective utilization of available resources. The ROBUS is designed to manage its redundant BIU 
and RMU components independently from the PEs. 

Fault containment refers to the isolation of physical faults to prevent their propagation throughout the 
system. This is achieved by establishing fault containment regions (FCR) that ensure a sufficiently high 
degree of independence with respect to physical faults. Ideally, the FCRs have separate power supplies 
and are physically and electrically isolated from each other. Communication between FCRs is through 
carefully specified interfaces that ensure a sufficiently high degree of fault containment. In the ROBUS, 
each RMU node is contained in its own FCR, and each BIU can be located by itself in a separate FCR, or 
it can share an FCR with its corresponding PE. 

Each BIU and RMU node is an observer of every node on the bus. An observed node is referred to as 
a defendant. The diagnostic system of the ROBUS is a distributed system divided into two layers. In the 
local layer, the nodes monitor the communication and independently diagnose each individual node and 
the bus as a whole. In the collective layer, the nodes exchange local diagnostic information to augment 
their local assessments. Every ROBUS node performs the diagnostic functions of error detection, node 
assessment, and bus assessment. 

Error detection is the foundation of the diagnostic system. The communication checks monitor the 
communication links between the nodes. The in-line checks are applied to the received messages and are 
based on expected timing and content characteristics. The cross-lane checks also detect errors in 
received messages by comparing them against the result of dynamic voting. The protocol checks inspect 
received messages and voting results with respect to expected properties for intermediate and final 
protocol results. The self-checks are performed by a node to monitor its own operation. PE-error 
checks inspect the messages received by the BIUs from their attached PEs. These error checks generate 
the syndromes from which diagnostic decisions are made. 

The diagnostic system assesses each node to determine its suitability to participate in the delivery of 
services to the PEs. A trustworthy node can be relied upon to deliver the expected services. 
Untrustworthy nodes do not behave as expected. A defendant is locally accused by an observer when 
the observer determines that the defendant is untrustworthy, but it is uncertain whether other observers 
have reached the same conclusion. A defendant is collectively convicted when the observers agree that a 
sufficient number of them consider the defendant untrustworthy. An observer forms a full diagnostic 
assessment of a defendant based on the local and collective diagnoses. 

In the context of the ROBUS, a clique is a group of BIUs and RMUs working together in a 
coordinated way to deliver services to the PEs. A clique is considered trustworthy if its services are in 
accordance with the specification. The diagnosis of the bus consists of determining if a trustworthy 
clique is in operation. 

The BIU and RMU nodes use the diagnostic assessments to determine the clique membership. A 
clique is reconfigured by adding or removing nodes from its membership. The purpose of 
reconfiguration is to enhance the ability of a clique to establish and preserve proper service delivery in 
the presence of untrustworthy nodes. A clique member is allowed to participate in the delivery of 
services to the PEs and is referred to as a trusted node. A node searching for or trying to become part of 
a clique is called a recovering node. 
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The FCRs ensure that the only error propagation path between nodes is through their interfaces. 
Error containment for the interfaces between BIUs and RMUs is realized by placing barriers at both 
ends of each interface. The BIUs and RMUs disable their outputs upon detection of a local failure or a 
bus failure (i.e., fail stop). At the receiving end of the interfaces, the nodes use input-error detection in 
the form of communication and in-line checks, and dynamic voting to mask undetected errors from 
trusted sources. The sources whose inputs are considered in a vote are called the eligible voters. 


2.1.4. Operational modes 

Figure 2.3 shows the major mode transitions for BIU and RMU nodes. After a power-on enable, a 
node goes to the Self-Test major mode to perform a local initialization and test its circuitry. The node 
will remain in this mode indefinitely unless it successfully passes the test. A recovering node enters the 
Clique Detection mode to determine whether there is a clique operating in the Clique Preservation mode. 
If a clique is found, the recovering node transitions to the Clique Join mode, where it demonstrates to the 
clique members that it is suitable for admission. If a clique is not found, the recovering node transitions 
to the Clique Initialization mode to form a new clique. In the Clique Preservation mode, a clique 
delivers services to the PEs according to the service schedule. At any time, if a node detects a local 
failure or a bus failure, it transitions back to the Self-Test mode to reinitialize its operation and find other 
nodes suitable for providing communication services to the PEs. 



Figure 2.3: Major operational mode transitions for ROBUS nodes 


2. 1.4.1. Clique Preservation 

Figure 2.4 illustrates the minor mode transitions for the Clique Preservation major mode. In the 
Schedule Update mode, a schedule -download protocol is executed to allow the PEs to reprogram the bus 
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according to their communication needs. During PE Communication, first the PE messages are 
broadcast according to the communication schedule, and then the BIUs and RMUs exchange accumulated 
accusations against nodes of the opposite kind, which serves to enhance the diagnosis and reconfiguration 
capabilities of the bus. This is followed by a re-synchronization of the local time in the Synchronization 
Preservation mode and then a reassessment of the clique membership in the Collective Diagnosis mode. 


Schedule 


PE 


Synchronization 


Collective 

Update 


Communication 


Preservation 


Diagnosis 


T 


Figure 2.4: Minor mode transitions for Clique Preservation mode 


2. 1.4.2. Self-Test 

Upon entering the Self-Test mode, a node disables its output and performs a reset of its circuitry. This 
mode serves as a checkpoint in which the nodes are required to exercise and assess the status of their 
circuitry before attempting to join other nodes on the bus. This mode also provides a safe state to which 
the ROBUS nodes can go after detecting a failure and before attempting to re-engage. 


2. 1.4.3. Clique Detection 

Figure 2.5 shows the minor mode transitions in the Clique Detection major mode. In Local Diagnosis 
Acquisition, a node uses asynchronous local observations to make a first assessment of the likely 
members of a clique. In Synchronization Acquisition, the node attempts to synchronize to the clique. 
In Collective Diagnosis Acquisition, the node captures the health assessment for each node as 
determined by the clique during the execution of the distributed diagnosis protocol. If at any time during 
the Clique Detection mode the node determines that a valid clique is not present, it will exit this mode and 
attempt to form a new clique. Otherwise, it will assume that a clique exists and will try to join it. 


► To 
Clique Join 


Figure 2.5: Minor modes transitions for Clique Detection mode 


From Self-Test 



To Clique Initialization 


2. 1.4.4. Clique Join 

When a node enters the Clique Join mode, its state is in agreement with the state of the clique. In this 
mode, the node runs for two diagnostic cycles, essentially frying to demonstrate that it can be trusted. 
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The existing members of the clique will integrate the node as soon as they confirm that the admission 
rules have been satisfied. 


2. 1.4.5. Clique Initialization 

Figure 2.6 shows the minor mode transitions for the Clique Initialization major mode. A node 
transitions to the Clique Initialization major mode to form a new clique. The first minor mode is Initial 
Diagnosis, in which a node identifies other nodes that are also attempting to form a new clique. This is 
followed by the Initial Synchronization and Collective Diagnosis minor modes, where the nodes are 
synchronized and a consistent clique membership is established. 


From 

Clique Detection 



To 

Clique 

Preservation 


Figure 2.6: Minor modes transitions for Clique Initialization mode 


2.1.5. ROBUS Messages 

The BIUs, RMUs, and PEs communicate using ROBUS Messages (RM). Figure 2.7 illustrates the 
message format, which consists of a Tag field followed by a Payload field. The Tag field has one of two 
values: SPECIAL or DATA. The format and content of the Payload field depends on the value of the Tag 
field and the context in which the message is used. 


1 bit 


fixed number of bits 


Tag Field 


Payload Field 


Figure 2.7: ROBUS Message format 

A SPECIAL message carries a bit pattern corresponding to one of several labels, including the INIT 
and ECHO labels used by the synchronization protocols. 

DATA messages carry data with a context-specific format. For Collective Diagnosis, the Payload 
field of each message carries diagnostic information in the form of a Boolean vector. For Schedule 
Update, the messages carry the number of messages scheduled for a particular PE. For the PE Broadcast 
protocol in the PE Communication mode, the messages carry information from the PEs with an 
application-dependent format. The exchange of accusations after the completion of the scheduled PE 
broadcasts uses the payload format for diagnostic messages. 




2.1.6. Point-to-point communication 


Figure 2.8 illustrates the composition of the one-way communication path between a BIU and an 
RMU. The link transmitter and receiver are part of the Communication Module at the source and receiver 
nodes, respectively. The received messages are stored by the Computation Module at the receiver node 
until the proper time for processing. This arrangement supports all the modes of point-to-point 
communication between BIUs and RMUs: synchronous, fixed-delay, and asynchronous-monitoring. 


Send Transmission Reception To processing 



Figure 2.8: Generic point-to-point communication path 

Synchronous communication is used with the synchronous protocols. This is a time -triggered 
communication scheme. Synchronous communication requires that the local-time clocks of the source 
and receiver nodes be synchronized within some known bounded precision. Figure 2.9 illustrates the 
main variables. A time T REF is chosen as a reference to coordinate the send and receive actions. A 
message sent at time T S nd with a nominal reception delay R PP is expected to be received at local time 
Trcv,e- Taking into consideration the local-time skew between the source and the receiver, and the 
uncertainty in the reception delay, a message from a trustworthy source should be received within an 
expected-reception interval of duration W RCV centered at T RCV , P . A message that arrives outside this 
interval is considered invalid. A valid received message is buffered until the scheduled time for 
processing T PRO c- This buffering corresponds to a deskewing function in which the received message is 
synchronized to the local time at the receiving node. 


! <- 


Rp 


i<- 


' 


W R 


' 


Local Time 
at sending node 


Local Time 
at receiving node 


Figure 2.9: Timing for synchronous communication 


Fixed-delay communication is used with the synchronization protocols. For this communication 
mode the information of interest is in the timing of the messages. A transmission is triggered by events at 
the source. At the receiving end, the message is buffered for a predetermined time duration before 
processing it. This communication mode is used only with the 1NIT and ECHO messages of the 
synchronization protocols. For the Synchronization Preservation protocol executed in the Clique 
Preservation and Clique Join modes, it is possible to use local events at the nodes to determine a nominal 
expected time of reception and an expected-reception interval for the synchronization messages. 


Asynchronous-monitoring communication is used by a recovering node to observe the activity on 
the bus before its local time is synchronized. This communication mode does not require coordination 
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between a source node (which could be a synchronized clique member) and the receiving recovering 
node. Asynchronous monitoring is made possible by the fact that the BIUs and RMUs broadcast their 
transmissions to all the nodes of the opposite kind and the point-to-point communication path allows the 
recovering node to receive messages regardless of its state. The recovering node uses the buffer as a 
fixed-delay queue, and the messages are processed in the order in which they are received. The delay is 
not necessarily the same as for the fixed-delay communication mode used with the synchronization 
protocols. 

The PE Interface at the BIUs is designed using a first-in first-out (FIFO) buffer abstraction for input 
and output. For input, it is assumed that each PE message is available when expected or there is a 
corresponding error indication. For output, it is always assumed that the message can be output at its 
scheduled time without having to confirm that the PE is ready to receive it. 


2.1.7. Communication patterns 

This section presents the patterns of communication for the ROBUS-2 protocols. The description is 
limited to the sequences of computation processes and message transmissions. The actual computation 
operations performed by the nodes are not described here. [Torres 05] has a complete description of the 
protocols. 


2. 1.7.1. Collective Diagnosis 

Figure 2.10 shows the communication pattern for the Collective Diagnosis protocol. The circles 
represent the processing done by the nodes. Each arrow represents a single-message broadcast 
transmission from the sources to the receivers. BIUs and RMUs use synchronous communication. The 
results of the protocol are the convictions against BIUs and RMUs, which are stored locally by BIUs and 
RMUs, and forwarded to the PEs. 


PEs 


BIUs 


RMUs 



Figure 2.10: Message flow graph for the Collective Diagnosis protocol 


2. 1.7. 2. Schedule Update 

Fet N denote the number of BIUs, which is assumed to equal the number of PEs connected to the bus. 
The PEs are identified according to the statically assigned identification numbers which uniquely identify 
each ROBUS port. The desired schedule is delivered by each PE to its BIU in the form of N consecutive 
messages with the positions in the sequence corresponding to the identification numbers of the PEs and 
the payload fields of the messages indicating the desired number of messages to be broadcast. The 
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interval between the send time of one message and the send time of the next (known as the data 
introduction interval or DII) [De Micheli 94] is constant. The submitted schedule messages are 
processed using an agreement protocol, called the Schedule Update protocol, to ensure that all the BIU 
and RMU clique members and the PEs agree on the result for each PE. Figure 2.11 shows the message 
flow graph for the Schedule Update protocol. BIUs and RMUs use synchronous communication. The 
protocol is applied independently N times, with each iteration processing the messages delivered by the 
PEs that indicate the number of messages to be broadcast by a particular PE. The result of each protocol 
iteration is sent back to the PEs in process P2. After all the messages have been processed, the ROBUS 
nodes individually assess the resulting schedule. If the new schedule is valid, it is accepted. Otherwise, a 
default schedule known to all the PE, BIU, and RMU nodes is used. 



2. 1.7. 3. PE Broadcast 

In PE Communication mode, the ROBUS grants bus access to individual PEs according to the 
communication schedule. An interactive consistency protocol, called the PE Broadcast protocol, is used 
for each scheduled message to ensure that the PEs receive consistent messages. The bus access pattern is 
a time -indexed, as-soon-as-possible (ASAP) round-robin sequence. Figure 2.12 provides an example of 
the access pattern. The PEs access the bus in ascending order according to the port identification 
numbers. The first scheduled message is sent at some predetermined time. The Dll for PE messages is 
constant. After all the scheduled messages for one PE have been sent, the messages for the next PE are 
broadcast maintaining the proper Dll between messages. If a PE is not scheduled to send messages, then 
the messages for the next scheduled PE are sent. This continues until all the scheduled messages have 
been sent. 



PE 1 

PE 2 

PE 4 

PE 6 

PE 7 








1— ► 


Time 

Figure 2.12: Example of an access pattern during the PE Broadcast service 

Figure 2.13 shows the message flow pattern for the PE Broadcast protocol. This protocol is used to 
process each scheduled PE message. Only the scheduled PE and its corresponding BIU are required to 
send messages. The protocol uses synchronous communication. The result of the protocol is relayed to 
the PEs. 
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PEs 

BIUs 
RMUs 

Figure 2.13: Message flow graph for the PE Broadcast protocol 

2.1.7. 4. Accusation Exchange 

The broadcast of PE messages in the PE Communication mode is followed by an exchange of 
accumulated accusations against nodes of the opposite kind using the Accusation Exchange protocol. 
Figure 2.14 shows the message flow pattern. This protocol uses synchronous communication. 



BIUs 
RMUs 

Figure 2.14: Message flow graph for the Accusation Exchange protocol 

2.1.7. 5. Synchronization Preservation 

Figure 2.15 shows the communication pattern for the Synchronization Preservation protocol. The 
message to be sent by each process is indicated in the figure. Fixed-delay communication is used for all 
the messages. For this protocol, it is possible to use the time of transmission of a message in one process 
to determine an expected time of reception in another process. For example, the RMUs can estimate the 
expected time of reception for process P3 based on the time of transmission in process PI. [Torres 05] 
describes in detail how to do this. 




Figure 2.15: Message flow graph for the Synchronization Preservation protocol 
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2.1.7. 6. Local Diagnosis Acquisition 

In Local Diagnosis Acquisition, a recovering node monitors the activity on the bus to determine a 
trusted set of opposite -kind nodes operating in the Clique Preservation mode. The recovering node uses 
the asynchronous-monitoring communication mode to make its observations. A node in this mode does 
not transmit messages. Figure 2.16 illustrates the nominal message flow for a recovering RMU in a 3x3 
system (i.e., 3 BIUs and 3 RMUs). The PEs and their links are not shown. The solid arrows represent the 
message flow from the BIUs to the recovering node. The dashed lines represent the bidirectional 
communication between the BIUs and the other RMUs. The message flow is similar for a recovering 
BIU. 



Figure 2.16: Message flow in a 3x3 system with a recovering RMU executing Local Diagnosis Acquisition 


2.1.7. 7. Synchronization Acquisition 

The Synchronization Acquisition mode has two protocols: Frame Synchronization and 

Synchronization Capture. The Frame Synchronization protocol monitors the activity on the bus 
essentially to find the gap between consecutive executions of the Synchronization Preservation protocol. 
Figure 2.16 also applies to this protocol. The recovering node can use fixed-delay or asynchronous- 
monitoring communication. 

The Synchronization Capture protocol is activated by the completion of Frame Synchronization 
protocol. A recovering node executing Synchronization Capture receives messages only from the 
opposite kind nodes and does not generate any messages. In that sense, Figure 2.16 also applies to this 
protocol. Synchronization Capture uses fixed-delay communication applied to ECHO messages from the 
Synchronization Preservation protocol. Figure 2.17 shows the message flow graph for the 
Synchronization Preservation protocol expanded to include the Synchronization Capture processes. As 
shown, a recovering RMU or BIU processes the ECHO messages broadcast between processes P2 and P3, 
or between P3 and P4, respectively. In addition, a recovering BIU executing process P4C also sends an 
ECHO message to its attached BIU. 
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Figure 2.17: Message flow graph for Synchronization Preservation with the Synchronization Capture processes 


2.1.7. 8. Collective Diagnosis Acquisition 

A recovering node executing the Collective Diagnosis Acquisition protocol is assumed to be 
synchronized to a clique. The processing in this protocol is essentially the same as for the Collective 
Diagnosis protocol and is executed by a recovering node in parallel with the execution of the Collective 
Diagnosis protocol by the clique members. Figure 2. 10 shows the message flow pattern for the Collective 
Diagnosis protocol. A recovering node receives messages using the synchronous communication model. 
In terms of the communication pattern, the main difference between Collective Diagnosis Acquisition and 
Collective Diagnosis is that a recovering node does not broadcast messages during the Collective 
Diagnosis Acquisition protocol. In that sense Figure 2.16 also applies to this protocol. 


2.1.7. 9. Initial Diagnosis 

In the Initial Diagnosis minor mode, the nodes execute a synchronous protocol to determine an initial 
trusted set taking advantage of the known bound on the synchronization precision when operating in the 
unsynchronized state. Figure 2.18 illustrates the message flow pattern. This protocol uses synchronous 
communication. 



Figure 2.18: Message flow graph for the Initial Diagnosis protocol 


2.1.7.10. Initial Synchronization 

The Initial Synchronization protocol is similar to the Synchronization Preservation protocol. The 
differences in processing are the result of the possibly large bound on the relative local-time skew at the 
beginning of the protocol execution. Figure 2.19 shows the message flow pattern. For this protocol, the 
BIUs send an ECFIO message to the PEs from process P4, instead of an 1NIT message from process P2. 
This protocol uses fixed-delay communication. 
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3. Timing models 


This section presents the abstract models used in the genreric timing analysis of ROBUS. RPP design 
constraints and application-specific timing specifications must be applied to the models in order to 
determine the behavior for a particular ROBUS implementation. The results of the analysis can then be 
used to compute the corresponding values for the structural and behavioral parameters of the RPP. 

The following terms are used throughout. A process is a set of interrelated activities or operations 
performed to produce a prescribed output or product. An action is something that is done to produce a 
result or effect. An operation is an action. An operation is composed of one or more operations, also 
called sub-operations. A primitive operation is the simplest type of operation for which further 
decomposition is not of interest. An event is a change in condition or state. An event takes place at an 
instant of time and does not have duration. Time is a nonspatial continuum that is measured in terms of 
events that succeed one another from past through present to future. A clock is a device for measuring 
time, and it contains a physical oscillation mechanism that periodically generates an event called a tick. 
The duration between two consecutive ticks is called the granularity of the clock. A trigger is an event 
that causes the activation of some action. An operation driver is an entity that causes an operation to 
advance by triggering the lower level actions. An event-triggered operation is an operation triggered by 
an event. A time-triggered operation is an operation triggered by a time event. An event-driven 
operation is an operation in which the sub-operations are triggered by events. A time-driven operation 
is an operation in which the sub-operations are triggered by time events. 

The timing models are expressed in terms of parameterized behavior of individual nodes. The 
behavior of groups of nodes can be expressed in terms of bounds on the values of these behavioral 
parameters, as well as bounds on the relative local-time skew. 


3.1. Physical oscillators and local-time clocks 

Each ROBUS node is driven by an independent, free-running physical oscillator (i.e., the phase is not 
controlled in any way by ROBUS) and a logical-time clock (i.e., a counter) that keeps track of the passage 
of time as indicated by the oscillator. An oscillator tick, also called a clock tick or a system tick, is the 
basic unit of time on the bus. In what follows, the term oscillator clock denotes the signal generated by 
the physical oscillator, the local-time clock refers to the logical-time clock, and the local time refers to 
the state of the logical-time clock. The process of synchronizing a signal or a message to the transitions 
of the oscillator clock is referred to as signal synchronization. The process of synchronizing a message 
to the local time is referred to as deskewing. 

Let f 0 denote the nominal frequency of an oscillator measured in ticks per second or Hertz (Hz). The 
duration of a tick for an ideal oscillator is exactly l/f 0 seconds. An ideal oscillator is said to have zero 
drift rate with respect to real-time since the oscillator perfectly marks the passage of time with a tick 
duration of exactly l/f 0 seconds. Real oscillators are characterized by non-zero drift rates with respect to 
real-time. It is assumed that the drift rate of the physical oscillators is bounded by a small constant p 0 , 
which is positive, real valued, and unitless. The bound on the drift of the physical oscillators is 
interpreted as follows. Let c x (T) denote the earliest real time at which local-time clock x reaches value T. 
c x (T) has units of nominal ticks (1 nominal tick = l/f 0 seconds). T , and T 2 denote arbitrary values of the 
local-time clock with the constraint T 2 > Tj. Then: 
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(T 2 - T 0/(1 + Po) < C X (T 2 ) - cJT,) < (1 + Po)(T 2 - TO 


(3.1) 


Let x 0 denote the nominal tick duration measured in seconds (i.e., 1 nominal tick = To seconds = l/f 0 
seconds). T x denotes the actual tick duration of local-time clock x. The bound on the drift rate of clock x 
can be expressed as follows: 

W(1 + Po) - 'tx - (1 + po)x 0 (3-2) 

In other words, the fastest clock has a tick duration of at least 1/(1 + p 0 ) nominal ticks, and the slowest 
clock has a tick duration of at most (1 + po) nominal ticks. This model accounts for the drift with respect 
to real time of the physical oscillators and the local-time clocks. The point-to-point communication 
model accounts for jitter on the output signal of the physical oscillators. 


3.2. Drift rate of a clock signal generated by a frequency divider 

We want to determine the drift rate for clock signal y generated by a factor-of-k frequency divider 
driven by clock signal x. k is a positive-integer constant. f x denotes the actual frequency of clock signal 
x, T x denotes the actual tick duration of clock signal x, x Xj0 denotes the nominal tick duration of clock 
signal x, p x denotes the actual signed drift rate for clock signal x, and p x denotes the absolute value of the 
actual drift rate (i.e., p x = lp x I). Here, p x > 0 corresponds to a slow clock, and p x < 0 corresponds to a 
fast clock. 

The relation between the actual drift rate and the actual tick duration of x is as follows: 


If p x * > 0, then T x = T Xi0 (l + p x ). (3.3) 

If p x " < 0, then T x = T x>0 /(1 + p x ). (3.4) 

Let x y denote the actual tick duration and absolute value of the drift rate for clock signal y, and let p y 
denote the actual absolute value of the drift rate for clock signal y. The actual frequency of the derived 
clock signal y is f y = f x /k and the tick duration is x y = kx x . The nominal tick duration of y is x y ,o = kx Xj o. 
Using (3.3) and (3.4), we show that a frequency divider preserves the drift rate of the driving clock signal 
as follows: 


If p x * > 0, then x x = x y /k = x x , 0 (l + p x ). Thus: x y = kx x>0 (l + p x ) = x y , 0 (l + p x ). (3.5) 

If p x * < 0, then T x = T y /k = T x /(1 + p x ). Thus: x y = kx Xi0 /(l + p x ) = T y /(1 + p x ) (3.6) 

Thus, clock signal y has the same drift rate as clock signal x (i.e., p y = p x ). In addition, the bound on 
the drift rate for clock x applies to the derived clock y. 

From this point on, we use p 0 as the bound on the drift rate for all clock signals. 


3.3. Paths to Self-Test mode 

A ROBUS node enters the Self-Test mode either for startup or for restart. The transition to this mode 
is triggered by events. For startup, the triggering event is the power-on enable. For restart, the triggering 
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event is the detection of a failure. 


3.3.1. Startup 

For startup, all the nodes that will form a clique are enabled within a time interval of known bounded 
duration. This results in a corresponding bounded relative skew of the local time at power-on enable. It 
is assumed that the nodes begin the Self-Test mode as soon as they are enabled. 

5poe denotes the actual duration of the time interval within which the nodes are enabled, and 5 P oElmax 
denotes the upper bound on 8 P oe- §poe is measured in seconds. 7I P oe denotes the upper bound on the 
relative time skew at power-on enable. 7i P0F is measured in nominal clock ticks. 8 P0 Elmax and 7i P0E are 
related as follows: 

ftpOE = SpOeI max /To (3.7) 


3.3.2. Restart 

A node returns to the Self-Test mode to attempt a restart after a failure condition is detected. This 
condition can be caused by random hardware failures or by environmental conditions. Random hardware 
failures generally affect only one node at a time, while environmental conditions have the potential to 
directly affect all the system nodes simultaneously. After detecting a failure, a node is assumed to go 
through a process of local recovery to regain control of its local operation, probably involving a local 
reset, before entering the Self-Test mode. Figure 3.1 shows the sequence of events in the timing model. 
5 F cp denotes the actual duration of a fault-causing phenomenon measured in nominal clock ticks. 8 F cplmax 
denotes the maximum value of 8 FC p. A FD denotes the actual duration of the failure-detection delay 
measured in local clock ticks. 8 FD denotes the actual duration of the failure-detection delay measured in 
nominal clock ticks. 8 FD l m ax denotes the maximum value of 8 FD - A LR denotes the duration of the local 
recovery process measured in units of local clock ticks. 


Beginning of fault-causing 

phenomenon “ Beginning of local recovery 



time 


Figure 3.1: Path to Self-Test for a restart 


3.4. Self-Test mode 

Once in Self-Test mode, the execution can be driven by time or by local events associated with this 
mode. In general, the duration of the Self-Test mode must satisfy the timing requirements for the 
transient-fault scenarios expected to be handled successfully by the implementation. 
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A stm denotes the duration of the Self-Test mode for a ROBUS node measured in units of local clock 
ticks. 



time 


Figure 3.2: Self-Test mode 


3.5. Clique Detection mode 

The transition to Clique Detection mode is triggered by the end of the Self-Test mode. The Clique 
Detection mode is composed of three minor modes: Local Diagnosis Acquisition (a.k.a., Preliminary 
Diagnosis), Synchronization Acquisition, and Collective Diagnosis Acquisition. Local Diagnosis 
Acquisition is composed of two consecutive observation intervals, each of duration at least as large as a 
re-synchronization interval. Synchronization Acquisition is composed of the Frame Synchronization 
protocol and the Synchronization Capture protocol. Collective Diagnosis Acquisition has the same 
timing characteristics as the Collective Diagnosis protocol in Preservation mode. Figure 3.3 illustrates 
the elements of the model for this mode. 


Beginning of 

Schedule 

Update; 



Figure 3.3: Clique Detection mode 

ApD.begin denotes the delay from the time a node exits the Self-Test mode until the beginning of the first 
observation interval during Local Diagnosis Acquisition, measured in local-clock ticks. A PD O w denotes 
the duration of the observation intervals (or “windows”), measured in local-clock ticks. 

We assume that the Synchronization Capture protocol begins immediately after the Frame 
Synchronization protocol is complete. A FS begin denotes the delay from the end of the second observation 
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window during Local Diagnosis Acquisition to the beginning of the Frame Synchronization protocol 
during Synchronization Acquisition, measured in local clock ticks. A FS denotes the actual duration of the 
execution of the Frame Synchronization protocol measured in local clock ticks. 5 F s denotes the actual 
duration of the execution of the Frame Synchronization protocol measured in nominal clock ticks. A S c 
denotes the actual duration of the execution of the Synchronization Capture protocols measured in local 
clock ticks. 8 SC denotes the actual duration of the execution of the Synchronization Capture protocols 
measured in nominal clock ticks. 

Synchronization Acquisition ends with a synchronization reset and the setting of the local clock to 0. 
From this point on, the local time is synchronized to the clique in Preservation mode. The time delay to 
begin the Collective Diagnosis Acquisition protocol in Clique Detection mode is the same as the time 
delay to begin the Collective Diagnosis protocol in Preservation mode. A C D,be g in denotes the time from the 
synchronization reset to the beginning of the Collective Diagnosis protocol, measured in local clock ticks. 
A C d denotes the time to complete the execution of the Collective Diagnosis protocol in local clock ticks. 

The transition to the Clique Join mode occurs at the beginning of the execution of the Schedule 
Update protocol. Before that point, a detected failure attributable to the absence of a clique results in a 
transition to the Initialization mode. A S u, begin denotes the time from the end of the Collective Diagnosis 
protocol to the beginning of the Schedule Update protocol, measured in local clock ticks. A C dm denotes 
the actual duration of the Clique Detection mode for a ROBUS node, measured in units of local clock 
ticks. 8cdm denotes the actual duration of the Clique Detection mode for a ROBUS node, measured in 
units of nominal clock ticks 

It is assumed that a ROBUS node can detect the absence of a valid clique at any time after entering the 
Local Diagnosis Acquisition windows. After detecting this condition, a node clears its state and 
transitions to the Clique Initialization mode. Acdm-cim denotes the delay to transition to the Initialization 
mode after detecting the absence of a valid clique, measured in units of local clock ticks. 


3.6. Initialization mode 

The Initialization mode is composed of the Initial Diagnosis, Initial Synchronization, and Collective 
Diagnosis protocols. Figure 3.4 illustrates the model for this mode. 


Sync Reset 
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Figure 3.4: Clique Initialization mode 
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Aid, begin denotes the delay from the time a node enters the Clique Initialization mode until the time it 
begins execution of Initial Diagnosis, measured in units of local clock ticks. A ID denotes the actual 
duration of the Initial Diagnosis protocol, measured in units of local clock ticks. 5 ID denotes the actual 
duration of the Initial Diagnosis protocol, measured in units of nominal clock ticks. A IS begin denotes the 
delay from the time a node completes the Initial Diagnosis protocol until the time it begins execution of 
Initial Synchronization; measured in units of local clock ticks. A IS denotes the actual duration of the 
Initial Synchronization protocol, measured in units of local clock ticks. 5is denotes the actual duration of 
the Initial Synchronization protocol, measured in units of nominal clock ticks. 

Initial Synchronization ends with a synchronization reset and the setting of the local clock to 0. From 
this point on the local time is synchronized to the newly formed clique. The time to begin the Collective 
Diagnosis protocol in the Clique Initialization mode is the same as the time to begin the Collective 
Diagnosis protocol in the Preservation mode. The parameters for this execution are presented above in 
the section covering the Clique Detection mode. 

The transition to the Clique Preservation mode occurs at the beginning of execution of the Schedule 
Update protocol. 


3.7. Synchronized time-triggered operation 

Once a node has synchronized to a clique, and for as long as it remains synchronized, the execution of 
protocols is triggered exclusively by the local time. The Synchronization Preservation protocol is driven 
by events generated during its execution. The other protocols, known as the synchronous protocols, are 
driven by time. The synchronous protocols are executed in the Collective Diagnosis, Schedule Update, 
and PE Communication minor modes. Figure 3.5 shows the relevant variables. 
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Figure 3.5: Synchronous Operation 


A su denotes the actual duration of the Schedule Update protocol measured in local clock ticks. A PEbegin 
denotes the actual time delay from the end of the Schedule Update protocol to the beginning of PE 
Communication protocol, measured in local clock ticks. A PE denotes the actual duration of the PE 
Communication mode, measured in local clock ticks. A SP , be gin denotes the actual time delay from the end 
of the PE Communication protocol to the beginning of the Synchronization Preservation protocol, 
measured in local clock ticks. A SP denotes the actual duration of the Synchronization Preservation 
protocol, measured in local clock ticks. 
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3.8. Communication Module 


The Communication Module of each ROBUS node is composed of send and receive sub-modules. 
The send sub-module consists of one or more separate transmitters and supports broadcast transmissions. 
The receive sub-module consists of a separate receiver for each node of the opposite kind. 

For ROBUS-2, communication, processing, and diagnosis are based on single-message transactions. 
The size of a Communication Module transaction is a ROBUS message. The communication of each 
ROBUS message can be decomposed into multiple physical transmissions. However, the 
Communication Module should not combine multiple ROBUS messages into a single communication 
unless the timing of such transaction preserves the current assumptions and specified behavior of the 
Computation Module. 

The transmitters and receivers are expected to be generic components supporting event-triggered 
communication. For the transmitters, this means that the reading of a new message and the beginning of 
its transmission process is triggered by a send signal at the transmitter’s input interface. Similarly, the 
receivers should be able to receive new messages whenever they arrive. During normal operation, the 
only expected communication timing constraint at the input interface of the transmitter is the data 
introduction interval. Startup and restart timing constraints for the transmitters and receivers should be 
taken into consideration in the design of the startup and restart behavior for the ROBUS nodes. 

The message delivery delay is the time elapsed from the instant a transmitter receives a send request 
to the instant the message is presented at the receiver’s output interface in the communication model. It is 
assumed that the output of the receiver is not edge synchronous with respect to the clock signal at the 
receiving node (see Section 4). 

The message reception delay is equal to the message delivery delay plus the additional time delay to 
synchronize the received message to the local clock signal at the receiving node. 

Symbol d PP1 denotes the minimum point-to-point message delivery delay, measured in nominal clock 
ticks, dppj, denotes the maximum point-to-point message delivery delay, measured in nominal clock ticks. 
v PP denotes the delivery precision (i.e., uncertainty in the point-to-point delivery delay), measured in 
nominal clock ticks. r PP j denotes the minimum point-to-point message reception delay, measured in 
nominal clock ticks. r PPh denotes the maximum point-to-point message reception delay, measured in 
nominal clock ticks. e PP denotes the reception precision (i.e., uncertainty in the point-to-point reception 
delay), measured in nominal clock ticks. A Co mm denotes the minimum data introduction interval for a 
send port of the Communication Module, measured in local clock ticks. 


3.9. Computation Module 

The Computation Module is modeled in terms of two components: the Computation Process and the 
Send Process. The Computation Process handles the processing of received messages according to the 
requirements of the protocol being executed. The Send Process handles the timing and formatting 
requirements for the output messages. 
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3.9.1. Computation Process 

The Computation Process has two stages. The Reception Stage provides timing adjustment and inline 
error detection for the received messages. The Computation Stage performs the reduction of the set of 
received messages into a single result value, and it also handles the cross-lane error detection. 


3.9.1. 1. Reception Stage 

Depending on the protocol being executed, the Reception Stage provides either fixed or variable 
delays for the input messages. For the synchronization protocols, the Reception Stage has a fixed input- 
output delay for each input path. This behavior preserves the relative positions of the received 
synchronization messages, which allows the Accept function in the Computation Stage to properly read 
the relative skews of the timing events. In this mode, the delay of the Reception Stage depends mainly on 
the uncertainty in the time of reception and on the time required to process the messages for error 
detection and diagnosis. For the synchronous protocols, the Reception Stage performs a deskewing 
function in order to synchronize the received messages to the local time. The deskewing is applied 
independently to each input path, and the received messages are forwarded to the Computation Stage at a 
predetermined time. The Reception Stage also inspects the messages for error detection. 

For Local Diagnosis Acquisition and Frame Synchronization, the Reception Stage has a fixed input- 
output delay for each input path. 


3. 9. 1.2. Computation Stage 

The timing of the Computation Stage depends on the protocol being executed. For the 
synchronization protocols, the Accept function produces the output event with a predetermined delay with 
respect to the time it receives the input event to be selected. For the events not selected, the Accept 
function appears to have a variable input-output delay. For the synchronous protocols, the Reception 
Stage forwards all the messages at the same time and the Computation Stage produces its output a fixed 
delay later. In this case, the input-output delay of the Computation Stage is the same for all the inputs. 

The Computation Stage has a fixed input-output delay for Local Diagnosis Acquisition. A worst-case 
delay is used for Frame Synchronization. 


3. 9. 1.3. Timing model for synchronization protocols 

For the synchronization protocols, the Computation Process has a fixed delay from the time when the 
event to be selected is received to the time when the Accept output is asserted. We combine the 
Reception Stage and Computation Stage delays into a single parameter. Symbol A denotes the 
Computation Process delay for synchronization protocols, measured in local clock ticks. 


3. 9. 1.4. Timing model for the synchronous protocols 

For the synchronous protocols, the Reception Stage and Computation Stage operations are time 
triggered. The Reception Stage is allocated a time interval to receive, deskew, and diagnose the input 
messages. There is a predetermined time, referred to as the expected time of reception, that is defined as 
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the nominal time at which a message should arrive at the Receive Stage. The messages from good 
sources should arrive at the Reception Stage within a predetermined time interval (or window) centered at 
the expected time of reception. This interval is called the deskewing window. For expected messages 
arriving during the deskewing window, the deskewing function forwards the received messages at a 
predetermined time after the end of the window. The timing of the Computation Process for synchronous 
protocols is modeled by two delays: the time from the expected time of reception to the closing of the 
deskewing window, and the time from the closing of the deskewing window to the output of the 
Computation Stage. 

W Deskew denotes the size of the deskewing window measured in local clock ticks. W Deskew .pre denotes 
the pre -expectation window (i.e., the size of the deskewing window before the expected time of reception 
measured in local clock ticks). W Deskew , post denotes the post-expectation window (i.e., the size of the 
deskewing window after the expected time of reception, measured in local clock ticks. C denotes the 
processing delay from the closing of the deskewing window to the output of the Computation Stage, 
measured in local clock ticks. 


3.9.2. Send Process 

The Send Process handles the transmission of messages. Once the process has been triggered either 
by an event or by time, the internal operation of the process is driven by the time since the trigger. 


3. 9.2.1. Timing model for the synchronization protocols 

For the synchronization protocols, the operation of the Send Process for the first protocol transmission 
(i.e., the INIT from the BIUs) is triggered by the local time, while the remaining transmissions are 
triggered by the Accept output events. Symbol B denotes the send delay with respect to the reference 
event for synchronization protocols; measured in local clock ticks 


3. 9.2. 2. Timing model for the synchronous protocols 

For the synchronous protocols, the operation of the Send Process is triggered exclusively by time. 
Symbol S denotes the send delay with respect to a reference time, measured in local clock ticks 


3.9.3. Constraint on the data introduction interval for the Computation Module 

The throughput potential of the Computation Module is characterized by its minimum data 
introduction interval. This parameter characterizes the input rate constraints of the Computation Process 
and the Send Process. If the individual processes have different constraints, the larger one is taken as the 
constraint for the Computation Module. A Co mp denotes the minimum data introduction interval for the 
Computation Module; measured in local clock ticks. 


3.10. PE Interface 

The BIUs interact with the PEs using a first-in-first-out (FIFO) interface abstraction for input and 
output. It is assumed that the BIUs can access this interface for read and write without the need to 
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directly coordinate their actions with the PEs. For PE-to-BIU transfers, the PEs are responsible for 
making their data available to the BIUs at the input FIFO at or before the time at which it will be read. 
For BIU-to-PE transfers, the BIUs simply write the data to the output FIFO as soon as it is ready. The 
PEs are responsible for ensuring that no data is lost due to a buffer overflow. 

The interaction between PEs and BIUs can be coordinated indirectly by proper selection of the 
ROBUS timing parameters. 
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4. Point-to-point communication 


This section examines the point-to-point communication between ROBUS nodes. The 
Communication Module of each ROBUS node is composed of transmit and receive sub-modules. The 
transmit sub-module consists of one or more separate transmitters to support broadcast transmissions. 
The receive sub-module consists of a separate receiver for each node of the opposite kind. The 
transmitters and receivers are expected to be generic components supporting event-triggered 
communication. The granularity of a Communication Module transaction should be a ROBUS Message, 
since the communication, processing, and diagnosis performed by the ROBUS protocols are based on 
single-message transactions. For the transmitters, the reading of a new message and the beginning of its 
transmission process is triggered by a send signal at the transmitter’s input interface. Similarly, the 
receivers should be able to receive new messages whenever they arrive. The only expected 
communication throughput constraint at the input interface of the transmitters is the minimum data 
introduction interval (DII), which is the minimum number of clock ticks between consecutive requests to 
send messages. 

The communication system must be able to support the fixed-delay and synchronous communication 
models. For some receiver designs, the output signals from the receiver are not synchronized to the 
circuitry-driving signal generated by a local physical oscillator. Therefore, the Computation Module must 
synchronize each received message with respect to the local oscillator before proceeding with further 
processing. For the synchronous communication model, the processing of received messages is triggered 
by the local-time clock. Therefore, a node must be able to buffer received messages until it is time to 
process them. The timing design of the system must be able to handle the uncertainty in the time of 
transmission, the transmission delay, and the signal synchronization delay. 

In addition, this version of the ROBUS is intended to demonstrate that the bus can achieve a PE- 
message throughput that approaches the available bandwidth at the physical links. For most 
transmissions, it is possible to compute a local-time interval during which a receiver should expect to 
receive the message. For low link data rates, the reception intervals for individual nodes do not overlap 
and each message can be processed before the next one arrives. For high data rates, the reception 
intervals of consecutive messages overlap and the processing must be pipelined in order to match the link 
throughput. This section examines some critical aspects of pipelined communication. 


4.1. Synchronization of asynchronous signals 

Single -phase edge-triggered flip-flops used as building blocks in traditional synchronous sequential 
digital circuits have a simple nominal timing behavior: If the signal at the data input is stable within a 
specified window around the oscillator clock’s triggering edge, then the input value will propagate to the 
output of the flip-flop and stabilize within some guaranteed time. The propagation delay of the flip-flops 
is the time elapsed from the triggering edge of the oscillator clock until the output is stable. The window 
around the oscillator clock’s triggering edge is characterized by the setup and hold time of the flip-flop. 
The setup time is the minimum time that the input signal must remain stable before the triggering edge of 
the oscillator clock in order for the output of the flip-flop to meet the nominal propagation delay. The 
hold time is the minimum time that the input signal must remain stable after the triggering edge of the 
oscillator clock in order for the output of the flip-flop to meet the nominal propagation delay. 

The domain of an oscillator clock includes all the digital circuitry driven by that signal. A signal is 
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said to be synchronous with respect to a particular oscillator clock if the timing of the signal meets the 
input setup and hold time constraints of the flip-flops driven by the oscillator clock. A signal that does 
not meet these constraints is called asynchronous with respect to the given oscillator clock. Since the 
oscillator clocks in the fault containment regions of the ROBUS are independent and the timing of their 
transitions is not coordinated in any way, any signal crossing from one FCR to another is considered 
asynchronous when it arrives at the receiving FCR. 

Asynchronous signals must be synchronized to the oscillator clock before they can be processed. 
Various mechanisms can be used to achieve this synchronization. Ultimately, however, consideration 
must be given to the problem of violations of the setup and hold times of flip-flops reading the signal. A 
flip-flop sampling an input that is not stable within the setup and hold window can enter a metastable 
condition in which the output does not settle to a valid logic state within the nominal propagation delay. 
If not handled properly, this can result in the generation of more asynchronous signals and the 
propagation of errors throughout the receiving FCR. 

The mean time between failure (MTBF) for a flip-flop reading an asynchronous input is (see 
[XAPP077]): 

MTBF = e ~ C2 * tMm / (2*C!*f D *f c ) (4.1) 


where t M ET denotes the time available for the metastability to resolve itself (i.e., time allowed by 
downstream circuitry before reading the output of the flip-flop), f D denotes the input signal frequency 
(2*f D is the input signal event rate), f c denotes the oscillator clock frequency, C i denotes the metastability 
aperture of the flip-flop (which is related to the width of the window during which an input can cause a 
metastability condition), and C2 denotes the resolution rate (which is related to the speed with which the 
metastable condition will be resolved). Constants Cl and C2 are functions of the process technology and 
flip-flop design. For current technology, the variables of the MTBF can be selected such that the 
probability of metastability failures is extremely small. 

In the following analysis it is assumed that the problem of metastability is properly handled by the 
implementation of the ROBUS. Unless explicitly stated otherwise, it is assumed that the nodes have ideal 
signal synchronizers, each consisting of a single flip-flop driven by the oscillator clock. These ideal flip- 
flops have no metastable states and zero propagation delay. The timing behavior is as follows. 

If the input changes before the triggering-edge of the oscillator clock, this latest input value will 
propagate to the output as soon as the triggering-edge of the oscillator clock arrives. If the input 
changes at exactly the same time as the triggering-edge of the oscillator clock, the input value will not 
affect the output until the next triggering-edge of the oscillator clock (assuming that the input remains 
constant). 


4.2. Single-message communication 

The communication of a message from a source node to a receiver node is modeled as a four step 
process: (1) Send: The Computation Module of the source node signals the transmitter(s) in the 
Communication Module that a message is ready for transmission; (2) Transmission: The transmitter 
reads the message and transmits the corresponding signals over the transmission medium; (3) Delivery: 
The link receiver gets the message from the transmission medium and signals the arrival to the signal 
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synchronizer; (4) Reception: The synchronizer signals the arrival of a new message to the Computation 
Module. Figure 4. 1 illustrates the point-to-point communication path. CLK Rx denotes the oscillator clock 
at the receiving node. The message delivery delay is the time elapsed from the instant a transmitter 
receives a send request until the message is presented at the output interface of the receiver. The message 
reception delay is equal to the message delivery delay plus the additional time delay to synchronize the 
received message to the oscillator clock at the receiving node. 


Synchronizer 



Figure 4.1: Conceptual point-to-point communication path 

Symbols d PP1 and d PPh denote the minimum and maximum point-to-point message delivery delays, 
respectively, measured in units of nominal clock ticks. v PP denotes the delivery precision (i.e., the 
uncertainty in the point-to-point delivery delay) measured in units of nominal clock ticks. r PP j and r PPjh 
denote the minimum and maximum point-to-point message reception delays, respectively, measured in 
nominal clock ticks. e PP denotes the reception precision (i.e., the uncertainty in the point-to-point 
reception delay) measured in nominal clock ticks. 

4.2.1. Reception delay 

Let T 0 denote the local time at which the source sends the message, and let t 0 denote the corresponding 
real time. The real-time range of point-to-point message delivery is [t 0 + d PP1 , t 0 + d PPh ], Therefore, the 
delivery precision is: 

v PP = d PP h - d PP i (4.2) 

The minimum point-to-point message reception delay happens when the message is sampled by the 
input synchronizer at exactly the same time it is delivered. 

r pp,i = d PP j (4.3) 

The maximum point-to-point message reception delay happens when the message is sampled by the 
input synchronizer exactly one tick after it is delivered. The worst case delay occurs when the oscillator 
clock at the receiving node is slow. 

fpp.h — dpp.h + ( 1 + po) (4.4) 

The real-time range of reception is [t 0 + r PP>1 , t 0 + r PPh ]. Therefore, the reception precision is: 

epp = r PP ,h - r PP ,i = [d PPj h + ( 1 + Po)] - d PPj i = 1 + po + v PP (4.5) 
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e PP accounts for time -discretization errors, jitter and drift of the source and recceiver oscillators, as 
well as differences in point-to-point communication delays due to differences in thie length of 
communication wires or optical fibers. 

Next, we define IMP(x!, x 2 ), the Integer Mid-Point value (i.e., Rounded Average), as the integer 
closest to the mid-point of x 2 and x 2 . IMP(xi, x 2 ) is computed in two steps: 

Step 1: x = (x! + x 2 )/2 

Step 2: IMP = round(x), with round(x) = LxJ if x < Vi, or [ x I if x > Vi 

R PP denotes the expected reception delay: 


R PP - IMPO'pp i, i'pp h j 


(4.6) 


4.2.2. Estimate of the local-time at the source 

Let T rcv denote the local time at the receiver when it receives the message. To estimate the local time 
at the source node, the receiver assumes that the message reception delay is R PP ticks of its oscillator 
clock. The estimated local time at the source node at the time of reception is: 

Tsrc.e = To + Rp P (4.7) 

The error in the local-time estimate is bounded as follows. T RCV occurs no earlier than p PP1 nominal 
ticks from the actual local time T SRC E at the source: 

Itpp.i — (1 + po)Rpp - fpp.i (4.8) 

T RC v occurs no later than p PP ,h nominal ticks from the actual local time T SR c.e at the source: 


M-PP.h - r PP,h - Rpp/(1 + po) 


(4.9) 


4.2.3. Expected local time of reception 

Let 7t PPjSR denote a bound on the relative local-time skew between the source and the receiver nodes. 
This bound is assumed to hold for the duration of the communication. The expected local time of 
reception at the receiver is denoted by T rcv ,e. 

Trcv.e — To + Rpp (4. 10) 

Due to the relative local-time skew and the uncertainty in the message-reception delay, the message 
will arrive within some local-time interval containing T RCVj e. Let A PPRCV denote the local-time error in 
Trcv: 

App.rcv - T R cv - Trcv.e (4. 11) 

We want to determine the absolute maximum local-time error in T RC v, denoted by A PP R cvlabs-max: 
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ITrcv - T RC v,e 


k PP,RCVlabs -max 


(4.12) 


I < A 

The value of A PP , RC vlabs-max is derived as follows. The bound on the local-time synchronization 
between the source and the receiver nodes is expressed as: 

Icsrc(T) - Crcv(T)I < 71 pp ,sr (4.13) 

where c S rc(T) and c RC v(T) denote the earliest real times at which the local times at the source and at the 
receiver, respectively, reach value T. From the previous analysis, it is known that the real-time difference 
between the time when the source reaches T RC v.e and the time when the message is actually received T RC v 
is bounded above and below by p PPh and p PP j, respectively. 

c src(Trcv.e) - M-pp.i — c rcv(T R cv) - c src(T R cv,e) + ftpp.h (4.14) 

For local time T rcv ,e, inequality (4.13) can be re -expressed as: 

Csrc(TrCV,e) - 7I PPj sr ^ C R cv(T R cv.e) ^ Csrc(Trcv.e) + ftpP.SR (4.15) 

Combining inequalities (4.14) and (4.15), we get: 

-ftpP.SR - M-pp.i - c RCv(T R cv) - c rcv(Trcv.e) - ^PP.SR + ftpp.h (4.16) 

So: 

Icrcv(Trcv) - c rcv(Trcv.e)I - max(7l PPj s R + p PP ,i , 7t PP ,s R + f-i PP .h) (4.17) 

Equivalently: 

Icrcv(Trcv) - c R cv(T R cv.e)I ^ ftpp.sR + max(p PP ,i , fi PP ,h) (4.18) 

Using the constraint that the local clocks are p-bounded, the definition of A PPRCV , and the real time 
duration of A PP RCV ticks for the fastest allowed clock: 

IA PP , R cvl/(l + po) ^ lc R cv(T R cv) - c R cv(Trcv.e)I (4.19) 

Combining (4.18) and (4.19): 

IA PP , R cv 1^(1 + Po)( 7 tpp,SR + ma x(p PPj i , p PP ,h)) (4.20) 

Since A PP RC v is an integer, we can take the floor in (4.20): 

IA PP , R cv I ^ L(1 + poX^pp.sR + max(p PPj i , p PP ,h))J (4.21) 

Therefore, the worst-case local-time difference between the actual time of reception T RC v and the 
expected time of reception T RC v,e is: 

A PP , R Cvlabs-max — L(1 + poX^pp.sR + max(p PPj i , p PP ,h))J (4.22) 
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4.3. Coordination for synchronous communication 

For the synchronous ROBUS protocols, the scheduling of operations is based on a distributed 
synchronous composition abstract model of the system in which a single oscillator drives a common 
local-time clock and fixed-delay processes corresponding to the communication and computation 
operations of the BIUs and RMUs. Communication during time-driven operations is time -triggered. For 
each transmission, the sources and receivers use a particular local-time value as a distributed reference 
event to coordinate their actions. Given specific bounds for the reception delay and the relative local-time 
skew between sources and receivers, it is possible to coordinate the send and receive operations such that 
the transmitted messages are received within a predetermined local-time range measured at the receivers. 
The receivers can then apply a deskewing function and forward the received messages for processing at a 
predetermined local time. By leveraging the previous analysis, it is possible to analyze the source- 
receiver coordination problem using only global time (i.e., synchronized local time viewed from a global 
perspective). Figure 4.2 illustrates the relevant timing events. T RE f denotes the reference local- time 
value. T S nd is the time at which the message is sent. R PP is the expected reception delay. T RC v,e is the 
expected time of reception. W Deskew is the size of the deskewing window. W Deskewpre is the pre- 
expectation window (i.e., the size of the section of the deskewing window before the expected time of 
reception). W Deskew>post is the post-expectation window (i.e., the size of the deskewing window after the 
expected time of reception). T PRO c, begin denotes the time for the beginning of message processing. 



Trcv.E " Deskew, pre TrCV.K ^Deskew, post 


Figure 4.2: Timing events for point-to-point communication 

A message from a good source is expected to arrive during the following closed time interval, which 
includes all triggering edges of the local clock within the expected time range of reception: 

[T R CV,E - A PP R cvlabs-max) T R CV,E + A PP R cvlabs-max] (4.23) 

The deskewing window includes all triggering edges of the clock within the expected time range of 
reception, a total of 2A PP RCV l a bs-max + 1 edges. The deskewing window is intended to cover the duration of 
all local clock counts corresponding to the triggering edges of the clock within the expected time range of 
reception. The local clock counts corresponding to these triggering edges determine a time interval with a 
duration of 2A PPjRC vlabs-max + 1 ticks. The deskewing window extends for the real-time interval 
corresponding to the following half-closed local-time interval: 

[T R CV,E ■ A PP R cvlabs-max> T R cv,E + A PPR cvlabs-max +1) (4.24) 

So: 

Deskew — 2 Appp(;\/l a |-> s _ max “t" 1 (4.25) 

And: 
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^^Deskew,pre — AppRcylabs-max (4.26) 

^^Deskew.post ^PP.RC vLbs~ina\ 1 (4.27) 

For proper communication, the following constraints must be satisfied: 

Tref ^ T snd (4.28) 

Tref - Trcv.e - W Deskew pre (4.29) 

Trcv.e = Tsnd + Rpp (4.30) 

Trcv.e "t" W Deskew, post — Tp R 0C, begin (4.31) 


Relations (4.28) and (4.29) express basic time constraints for the common reference time. Relation (4.30) 
captures the goal of source-receiver coordination, which is to receive the message at the expected time of 
reception. Relation (4.31) is only relevant to the composition of operations at the receiving node. Let 
Aref-snd denote the delay from T REF to T S nd measured in local clock ticks. 

A RE f-snd = Tsnd - T REF (4.32) 

Let A ref _ RC vwnd denote the delay from T REF to T RC v,e - W DeS kew,pre measured in local clock ticks. 

A REF - R CVWND — (T R CV,E - Wneskew ,pre ) - T ref (4.33) 

Let A REF . S NDlnnn and A REF . RC vwNDlmin denote the minimum possible values for A ref . SN d and A ref _ RCV wnd, 
respectively. Relation (4.30) can be re -expressed as follows: 

Aref-snd + Rpp - A REF - R CVWND + Wneskew ,pre (4.34) 

We are interested in finding the values for A ref . SN d and A ref . RCV wnd to achieve the earliest communication 
satisfying (4.28), (4.29), and (4.30). We consider two cases. 

Case 1: A REF _SNDlmin + Rpp ^ A REF _ R cvWNDlmin + Wneskew.pre 

For this case, the message can be sent as soon as possible, but the window must be delayed to align it 
with the expected time of reception. 

A REF -SND — A REF _SNDlmin (4.35) 

A REF - R CVWND — A REF _SNDlmin + Rpp “ Woeskew ,pre (4.36) 

Case 2: A REF .SNDlmin + Rpp < A REF . R cvWNDlmin + W Deskew pre 

For this case, the window can be opened as soon as possible, but the message must be delayed to 
achieve proper alignment. 

A REF -SND — A REF _ R cvWNDlmin + Woeskew ,pre Rpp (4.37) 
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(4.38) 


AreF-RCVWND — ^REF-RCVWNDlmin 

4.4. Message streams 

Each message in a message stream is processed independently. Let K denote the total number of 
messages in the stream, i denotes the index for the messages in the stream, with 0 < i < K-l. T SNDji is the 
local time at which the source sends the i-th message of the stream. T RC v,E,i is the expected local time of 
reception for the i-th message of the stream. A stream denotes the data introduction interval at the source 
measured in local clock ticks. 

The throughput capacities of the Communication Module and the Computation Module are 
characterized by their respective minimum data introduction interval [De Micheli 94], Let A Co mm and 
Acomp denote the minimum data introduction interval for the Communication Module and the 
Computation Module, respectively. Acomm and A Co mp are measured in local-clock ticks. Lor proper 
processing. A stream must be greater than or equal to A Co mm and A Co mp- 

A s tream — ITiax( Acomm. Acomp) (4.39) 


4.4.1. Message delivery rate 

We would like to compute the number of messages that can be delivered during a particular time 
interval. We consider intervals during steady state transmission after the leading edge of the stream and 
before the trailing edge. Because of the drift rate of the clocks, the observed number of delivered 
messages can vary within a range. 

Let Wrcv denote the size of the observation window at the receiving node measured in local clock 
ticks. Q denotes the number of messages delivered during the observation window. Xsrc denotes the data 
introduction interval measured in nominal clock ticks. w RC v denotes the size of the observation window 
at the receiving node measured in nominal clock ticks. Let t deliverii denote the real time at which message i 
is delivered. 


Ideliver.i ^deliver. 0 + iA-SRC (4.40) 

Let t 0 bs,i and t 0 b s ,h denote the beginning and end times, respectively, for the observation window. The 
observer records received messages during the closed interval |t obsi , t obs> h]. t obs l and t obs ,h are related by the 
size of the observation window. 

tobs.h = t obSj i+ w RC v (4.41) 

The following constraints are applied in order to determine the number of observed messages. 


tdeliver,0 ^ tobs,l 

(4.42) 

tdeliver,l — t-obs,l 

(4.43) 

tdeliver,Q — tobs,h 

(4.44) 
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tdeliver.Q+1 f.hs.h 


(4.45) 


For these constraints, a total of Q messages in the index range 1 to Q are delivered within the 
observation interval. The maximum value of Q is derived as follows. Relation (4.44) can be re-expressed 
as: 

tdeliver.o + Q^-SRC - tobs.l + W RCV (4.46) 

So: 

Q ^ [(tobs.l - tdeliver.o) + WrcvJA-SRC (4.47) 

The right-hand side reaches its maximum value when t obs ,i - t deliver 0 = X SRC . In that case, t deliver l = t obs ,i. So: 

Q - [^-src + w rcv]/^src (4.48) 

Since Q is an integer, we can take the floor on the right-hand side of the expression. Then: 

Qlmax — L w RCv/^-SRcJ + 1 (4.49) 

For a fast source clock: 

^SRC.fast = A stream /(1 + po) (4.50) 

For a slow receiver clock: 

w RCV.slow = (1 + Po)W R cv (4.51) 

Therefore, for the maximum value of Q: 

Qlmax = L(W RCV /A stream )( 1 + po) 2 J + 1 (4.52) 

The minimum value of Q is derived as follows. Relation (4.45) can be re -expressed as: 

tdeliver.o + (Q + 1)^-SRC > t 0 bs,l + W R CV (4.53) 

So: 

Q > [(tobs.l ' tdeliver.o) + w RCv]/^-SRC “ 1 (4.54) 

The right-hand side approaches its minimum value as t obsd - t de ii V er,o approaches 0. So: 

Q > Wrcv/^-src ■ 1 (4.55) 

Q is an integer strictly larger than w RC v/^src - 1 ■ The smallest integer that satisfies this relation is given 

by: 

Qlmin Lw R cv/^src J (4.56) 
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For a slow source clock: 


^SRC,slow — (1 + po)A stream (4.57) 

For a fast receiver clock: 

w RCV,fast = W RCV /(1 + Po) (4.58) 

Therefore, for the mi nimum value of Q: 

QUn = L(W RCV /Astream)/( 1 + P()) 2 J (4.59) 

4.4.2. Expected local time of reception 

The transmission times for the messages are related by the data introduction interval: 

TsND.i = TsnD.O + iAstream (4.60) 

At the receiver, the relation among the messages is similar. 

T R cv,E,i = T R cv,e,o + iA stream (4.61) 

Using the analysis for single -message communication: 

TRev.E.i — TsND.i + Rpp (4.62) 

Let T RC v,i denote the actual time of reception for the i-th message. From the analysis of single -message 
communication, T RC v,i and T RC v.E.i are related as follows: 

ITRCV.i - T R cv,E,i I ^ ApP,RCvlabs-max (4.63) 

Re-expressing (4.63): 

TRCV.E.i - App jR cvlabs-max ^ T R cv,i ^ T R cV.E,i + App, R cvlabs-max (4.64) 

The stream as a whole should be received within the following local time interval: 

[TrCV.E.O - App R cvlabs-max > TrCV,E,K- 1 + App , R Cvl a bs-max] (4.65) 

Re-expressing (4.65): 

[T R CV,E,0 - App R cvlabs-max > T R CV,E,0 + (K-l )A st r e am + App, R cvlabs-max] (4.66) 


4.4.3. Message reception rate 

The Astern communication parameter gives the nominal message reception rate for the stream in units 
of ticks per message. An important consideration for the processing of message streams is the relation 
between A stream and A P p, RCV labs-max- As presented above, Ap P , RCV labs-max measures the uncertainty in the time 
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of reception of each message. In particular, the total uncertainty in the time of reception for a particular 
message is 2A PP , RC vlabs-max local clock ticks centered around the expected time of reception. A message 
from a good source can be received at any of the 2A PPjRC vlabs-max + 1 triggering edges of the oscillator 
clock in the corresponding reception interval. Let Z denote the number of messages from a good source 
that can be received during a 2A PP , R cvlabs-max interval. Then: 

Z< — l_2 A PRRC vl a bs-max/ A str eanJ 1 (4.67) 


4.4.3. 1. Non-overlapping reception intervals 

If Astern > 2 Ap P . RC vUs-max, the expected reception intervals for consecutive messages do not overlap or 
even coincide end-to-end (i.e., no shared triggering edges in consecutive expected reception intervals). 
For this case, Z = 1, which means that the messages of the stream are received as separate 
communications with no interaction. 


4.4.3. 2. Overlapping reception intervals 

If Astern ^ 2A PPjRC vlabs-max, the expected reception intervals for consecutive messages overlap or 
coincide at the ends. For this case, Z > 1, which means that the interaction between the messages must be 
taken into consideration. This is especially important for the diagnosis of timing errors. 


4.4.4. Load size for a message reception buffer 

We refer to the number of messages stored in a buffer as the load on the buffer. The function of the 
message receive buffer is to collect the messages received at the Computation Process. For single- 
message communication, it is expected that the processing of each message will begin at or before the 
next message is received. The same can occur for a message stream in which the reception intervals for 
consecutive messages do not overlap. In these cases, the load of the receive buffer is less than or equal to 
1 . From this point on, we only consider cases in which the processing of individual messages may begin 
after the reception of subsequent messages in the stream. This includes cases of overlapping and non- 
overlapping reception intervals. 

Let A PRO c,begin denote the delay in the beginning of processing of a message with respect to the 
corresponding expected time of reception. We assume that the interval between the beginning of 
processing of consecutive messages is the same as the data introduction interval for the message stream, 
A S |reanv T PRO c,i denotes the local time at the beginning of processing for message i. 

T PR OC,i = T R cv,E,i + A PR 0C, begin (4.68) 

4.4.4. 1. Combined message synchronization and buffering 

Figure 4.3 illustrates the interconnection of functions for this case. CLK Rx denotes the oscillator clock 
at the receiving node. STB Rx denotes the strobe signal indicating that a new message is ready. The Link 
Receiver transfers the messages to the Receive Buffer as soon as they are ready. The output of the 
receiver is assumed to be asynchronous with respect to the oscillator clock. The Receive Buffer is an 
asynchronous FIFO, which means that the push (i.e., write) and pop (i.e., remove) action signals are 
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synchronous with respect to different clock signals. In effect, in addition to being a buffer, the 
asynchronous FIFO serves as a signal synchronizer for data crossing from one clock domain to the other. 
Note that the data is read for computation one tick before it is popped from the receive buffer. 
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Figure 4.3: Reception using combined message synchronization and buffering 

In order to ensure a read-after-write sequence at the Receive Buffer during normal operation, the 
reading of a particular message by the Computation Process should be triggered after the end of the 
corresponding reception interval. The following relation must hold in order to satisfy this property. 

ApROC.begin > ApP.RCvlabs -max (4.69) 

Again, note that the pop takes place one tick after the start of processing for each message. t de i iver ,i 
denotes the real time at which message i is written to the buffer. A message gets pushed at the same time 
that it is delivered. With A SRC denoting the data introduction interval at the source node, t deliverii is given by 
the following equation. 

^deliver,! ~ tdeliver.O + i^SRC (4.70) 

Let t popi denote the real time at which message i is popped from the buffer. The pop times for the 
Computation Process are given by the following relation, with X RC v equal to the data introduction interval 
at the Computation Process. 

tpop.i ipop.o T i Lrcv (4.71) 

Let Qdeiiver(t) denote the number of delivered messages by time t. Q pop (t) denotes the number of 
popped messages by time t. QAsync-Buffer(t) denotes the number of messages held by the asynchronous 
Receive Buffer at time t. 


For Qdeliver(l)* 


0, fol t < f[ e li\ e i .( ) 

Qdeliver(t) = “S L[(t - td e liver,o)/^SRc] + 1-1- f° r Lelivcr.O - t ^ t de liver,0 + (K-1 )A,src 
K, for t > tdeliver.O + (K-I)XsrC 


(4.72) 


38 





For Qp 0 p(t): 


Qpop(t) = ^ 


0, for t < tp 0p ,o 

L[(t - t poPi o)/A,Rcv] + lJ. for tpop,0 — t — tpop,0 + (K-l)W 

K, for t > t pop o + (K- 1 )Xrcv 


(4.73) 


For QAsyn-Buffer(t): 

QAsyn-Buffer(t) QdeliverCO Qpop(t) (4.74) 

To determine the maximum load for the receive buffer, we consider the case of a fast source clock and a 
slow receiver clock. Thus: 


^-SRC - ^-SRC.fast - A strearn /(1 + po) (4.75) 

^-RCV = ^-RCV.slow = (1 + po)A stream (4.76) 

Assume that the first message is delivered at the earliest possible time. That is: 

tdeliver.O = CrCv(TrCV,E, 0 " App,RCvlabs-max) (4.77) 


The time of the first pop action is: 

tpop,0 CrCv(TrCV.E, 0 + ApROC.begin +1) 

= tdeliver.O + (1 + Po)(App,RCvlabs-max + ApROC.begin +1) (4.78) 

Since the source has a faster clock, the number of buffered messages can increase up to the instant the last 
message is delivered (i.e., t = t de i iveriK -i = t de ii V er,o + (K- 1 )A. S rc)- Thus, the maximum buffer load is given by 
QAsync- Buffer evaluated at t de li v er,K-l. 

QAsyn-Buffer(t)l max Q Asyn-Buffer(tdeliver,K- 1 ) 

— K - L|(K- 1 jA.SRC.fasl - (1 + Po)(App ,RCvlabs-max ^PROC, begin l)]/^RCV.slow + lJ 
= K - L(K-1)/(1 + po) 2 - ( App,RC V I abs-max + ApROC.begin + 1)/A st r e am + lJ (4.79) 


4.4.4.2. Separate message synchronization and buffering 

Figure 4.4 illustrates the interconnection of functions for this case. The receiver is assumed to hold 
the message until it is processed by the synchronizer. For this synchronization mechanism, the input rate 
must be slower than the local clock frequency to ensure at least one triggering edge of the oscillator clock 
per delivered message. Thus, A stream must be must at least 2 (i.e., A stream > 2). 
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Figure 4.4: Reception using separate message synchronization and buffering 


This configuration differs from the one using the asynchronous FIFO in that the synchronization is 
performed by a dedicated synchronizer. At a minimum, this element introduces a one -tick delay in the 
transfer of messages from the Link Receiver to the Receive Buffer. The worst-case delay in storing the 
message in the buffer is two oscillator clock ticks. Therefore, compared to the timing of the circuit with 
the asynchronous FIFO, the writing of streamed messages to the synchronous FIFO buffer begins at least 
1 local tick later and can end up to 2 oscillator clock ticks later. 

To determine the maximum load for the receive buffer, we consider the case of a fast source clock and 
a slow receiver clock. The maximum load is assessed at the earliest time at which the last message of the 
input stream can be written to the buffer. Let t write ,o denote the earliest real time at which a received 
message is written to the buffer. 

twrite.o = CrCv(TrCV.E,0 - A PP ,RCvlabs-max + 1) = tdeliver.O + (1 + P()) (4.80) 

The delivery time for the last message is: 

tdeliver.K- I tdeliver.O + (K-I)Xsrc ,fast (4.81) 

After delivery, the message must be synchronized and written to the buffer. In the fastest case, the 
delivered message is immediately read by the synchronizer and presented to the buffer for loading, which 
will then occur 1 tick later. 

t write, K- 1 — tdeliver.K- 1 + ( 1 + Po) — tdeliver.O + (K- 1 ))lSRC,fast + (1 + Po) (4.82) 

Let Qsync-Buffer(t) denote the number of messages held by the synchronous receive buffer at time t. The 
maximum load is given by: 


Qsync-Buffer(t)l max Qsync-Buffer(twrite,K- 1 ) 

— K - Qpop(twrite,K-l) 

— K - L[(t wr ite,K-l " tpop,oV^RCV,slow] + lj 

= K - L(K-1)/(1 + po) 2 - (A PP ,RCvlabs-max ApROC,begin)/A stream + 1 J 


(4.83) 
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5. Clock synchronization protocols 


This section examines the timing aspects for the local-time synchronization scheme. The diagnostic 
system works in close coordination with the clock synchronization system to determine the status of the 
bus and to specify the nodes eligible to participate in clock synchronization operations. That aspect of the 
ROBUS is outside the scope of this section. The analysis presented here uses the fundamental fault- 
tolerance concepts presented in Appendix A of [Torres 05] and the point-to-point communication 
concepts presented in Section 4 of this document. 


5.1. Clock synchronization system 

The allowed range for the tick duration x x of an oscillator is determined by the nominal tick duration x 0 
and the bound on the drift rate p 0 . 

Xo/(l + Po) <Tx< (1 + po)Xo (5.1) 

That is, an actual oscillator has a tick duration between 1/(1 + p 0 ) and (1 + p 0 ) nominal ticks. 

The local-time clock of a node is essentially a counter driven by the local physical oscillator. The 
local time is equal to the state of the counter. Resetting the counter sets the local time to 0. The clock 
synchronization system enables the nodes to use the local time as a reference for the coordination of 
distributed operations. A basic requirement for proper distributed coordination is that the relative clock 
skews remain within known bounds. The relative skew between two clocks is the real time elapsed from 
the instant one clock makes a particular state transition (i.e., the count reaches a particular value) until the 
other clock makes the same transition. In general, the relative skew between two events is equal to the 
real time elapsed between the occurrence of the events. Bounded relative skew is achieved by the 
generation and preservation of approximate real-time agreement on the transitions of the local-time clock. 
The synchronization protocols deliver high-precision distributed events used as references to reset the 
local-time clocks. The state of a local-time clock indicates the time elapsed since the last 
synchronization-reset event. The bound on the relative skew between synchronized clocks is tightest at 
the time of the synchronization reset. After the reset, the local times can drift apart from each other and 
from real time at rates determined by the drift rates of the oscillators. The clocks are resynchronized at 
regular time intervals in order to ensure that the relative skews remain within known bounds. 

Figure 5.1 illustrates the conceptual mode transitions for the clock synchronization system. Normally 
there is a clique executing the Synchronization Preservation (SP) protocol to ensure that their relative 
local-time skews remain within known bounds. Nodes in this mode are said to be in a synchronized 
state. A goal of every node is to reach and remain in this state. In the context of the synchronization 
system, nodes operating in a mode other than Synchronization Preservation are referred to as recovering 
nodes. After a power-on enable or the detection of a failure, a node examines the activity on the bus. If a 
clique is found, the recovering node transitions to Synchronization Acquisition (SA) mode in order to 
synchronize its local time to the time of the clique. If a clique is not found, the recovering node 
transitions to the Initial Synchronization (IS) mode. After achieving synchronization, the recovering node 
transitions to Synchronization Preservation mode. 
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Power-on 

enable 



Figure 5.1: Conceptual mode transitions for the clock synchronization system 

At the time of entry into the Synchronization Acquisition mode, the recovering node is in an 
asynchronous state in which there is no significant relation between its local time and the local time of 
the clique. The recovering node uses an Accept function to capture the synchronization events in the 
agreement propagation phase of the Synchronization Preservation protocol. This requires that the Accept 
function only receive synchronization messages from the same execution of the protocol. This is 
accomplished by enabling the Accept function after a frame synchronization step in which the gap 
between executions of the Synchronization Preservation protocol is found. 

In general, a group of nodes enters the Initial Synchronization mode within a time interval of known 
bounded duration. When a recovering node enters this mode, it expects that there is at least one node of 
the opposite kind that also makes the transition within the bounded time interval. This interval duration is 
in effect a bound on the relative local-time skew for the initializing nodes. Before the execution of the 
synchronization protocol, these nodes are said to be in an unsynchronized state since the initial skew 
bound can be relatively large compared to the skew after the execution of the protocol. 

Figure 5.2 illustrates how the mode transitions are related in time. A group of nodes enters Initial 
Synchronization with a large bound on the relative skew, denoted by Jti S . At the end of the protocol 
execution, the local time is set to 0 with the bound on the relative skew reduced to the level required for 
normal operation, denoted by 7t S p- At local time T S p, the Synchronization Preservation protocol is 
executed to ensure that the skew remains within the expected bound. This cyclic operation continues 
until a failure occurs or the system is shut down. A recovering node in Synchronization Acquisition 
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trying to synchronize to the clique executes the Frame Synchronization (FS) protocol followed by the 
Synchronization Capture (SC) protocol. The duration of the Frame Synchronization protocol execution 
depends on factors like the total number of nodes of the opposite kind, the number of untrustworthy nodes 
of the opposite kind active on the bus, the bound on the relative local-time skew of the nodes, and the 
position of the start of the protocol relative to local time of the clique nodes. Synchronization Capture is 
enabled immediately after the execution of Frame Synchronization is complete. The relative skew 
achieved by Synchronization Acquisition is within the bounds of the skew for normal operation. 
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Figure 5.2: Timing of mode transitions for the clock synchronization system 


The Initial Synchronization, Synchronization Preservation, and Synchronization Capture protocols are 
based on the same theory of distributed computation using Accept functions to process timing events. 
Figure 5.3 illustrates the message flow graph examined in this section. This graph includes all the 
processes and messages required for the three protocols. 



Stage 1 Stage 2 Stage 3 Stage 4 

Figure 5.3: Combined message flow graph for the analysis of the synchronization protocols 
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5.2. Stage 1: PO to PI 


Figure 5.4 illustrates the detailed message flow graph for stage 1 in a 3x3 system (i.e., 3 BIUs and 3 
RMUs). 


BIU RMU 



Stage 1 

Figure 5.4: Detailed message flow graph for stage 1 in a 3x3 system 


5.2.1. Expected time of reception for process PI 

The analysis for point-to-point communication presented in the Section 4 of this document can be 
leveraged for the problem of determining the local time range of reception of INIT messages in process 
Pi. This is covered in Section 5.9.2. 1 for the Initial Synchronization and Synchronization Preservation 
protocols. 


5.2.2. Bound on the observed relative skew of received messages for process PI 

Let n PljRC v denote the bound on the relative skew observed in process PI for the received messages 
from process PO at trustworthy BIUs. n P1RCV is measured in local clock ticks. n P1 RCV is used to check 
for agreement among the received inputs and also to check agreement with the result of the Accept 
output. 

Let Tpo denotes the local time at which a BIU node sends the INIT message in process PO (i.e., the 
local time when the source’s Computation Module signals the Communication Module to send the INIT 
message). t PO j and t POh denote the earliest and latest real times, respectively, at which the trustworthy BIU 
nodes send INIT in process PO. Let 7t P0 denote the bound on the relative local-time skew for the 
trustworthy BIUs. Jt P0 is assumed to apply for the duration of the protocol execution. 7t P o also bounds the 
precision with which the trustworthy BIU nodes send the INIT messages. 


ftpo - tpo.h - tpo.i (5.2) 

Let t P1 RCV ,i and t P1 RCV> h denote the earliest and latest real times, respectively, at which an INIT message 
from a trustworthy BIU node can be received in process PI at the trustworthy RMUs. 


tpi.RCV.l - tpO.l + FPP.1 

(5.3) 

tpi.RCV.h = tpO.h + Tpp.h 

(5.4) 
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Let T p1jRC v,i and T P1 RCV> h denote the earliest and latest local times, respectively, at which a node in process 
PI can receive messages from process PO at trustworthy BIU nodes. A PLRC v denotes the measured skew 
between the earliest and latest received messages from trustworthy BIU nodes (i.e., A P i, RC v = T P i jRC v,h - 
Tpi,rcv,i). We need to determine the maximum value of A P i, RC v- Using (5.3), (5.4), and the local-clock 
function : 

c RCv(Tpi,RCV,h) ' c RCv(Tpi,RCV,l) - tpi.RCV.h " tpi.RCV.l (5.5) 

From the constraint that the drift rate of the local clocks be p 0 -bounded and the definition of A P1RCV : 
Api,rcv/(1 + po) - c R cv(Tpi, R cv,h) - c R cv(Tpi,rcv,i) (5.6) 

Combining (5.5) and (5.6), and using the fact that A P1RCV is an integer: 

Api.rcv ^ L(1 + po)(tpi ,RCV,h ‘ tpi,RCV,l )J (5-7) 

n P1 , RCV is given by the maximum value of A RLRCV : 

n P i,RCV — A P1 , R cvlmax - L(1 + po)(tpi,Rcv,h - tpi,RCv,i)J — L(1 + po)(7t P o + e pp)j (5-8) 


5.2.3. Relative skew of the Accept outputs for process PI 

Let A P1 denote the delay (in local-clock ticks) of the Computation Process in process PI measured 
from the local time of reception of the selected message until the Accept output is asserted. t PRA j and 
ipi.A.h denote the earliest and latest real times, respectively, at which an Accept output in process PI at the 
trustworthy RMUs can be asserted. 

tpi.A.i = tpo.i + r PP j + A P [/(1 + po) (5.9) 

tpi.A.h = tpO.h + fpp.h + ( 1 + po)A P i (5.10) 

Therefore, the Accept functions of the trustworthy RMU nodes assert their outputs during a real-time 
interval with the following duration: 

tpi.A.h - tpi.A.i = [Ttpo + fpp.h + (1 + po)A P i] - [r PP j + A P [/(1 + po)] 

= 7t P o + e PP + [(1 + Po) - 1/(1 + po)]A P1 (5.1 1) 

Let AEV_P1 denote the set of asymmetric BIU eligible voters in process PI at a trustworthy RMU node. 
IAEV_P1I denotes the cardinality of AEV_P1. 7t P i jA denotes the bound on the real-time relative skew of 
the Accept outputs in process PI at the trustworthy RMUs. If IAEV_P1I = 0 for each trustworthy RMU 
node, they essentially accept on the same message. 

7 I pi,aIiaev_pii = o — e pp + [(1 + Po) - 1/(1 + Po)]A P i (5-12) 

If IAEV_P1I + 0 for some trustworthy RMU, all we know with certainty is that the RMU nodes accept on 
a message from a trustworthy BIU node or a message from an untrustworthy BIU node flanked by 
messages from trustworthy BIU nodes. 
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TCp1,aI|AEV_P1I*0 - Ttpo + e PP + [(1 + Po) - 1/(1 + Po)]Api 


(5.13) 


From this point on, unless otherwise stated: 

7tpi,A — ^Pl.Almax — ^P1 .aI|AEV_P1I ^ 0 (5.14) 

5.3. Stage 2: PI to P2 

Figure 5.5 illustrates the detailed message flow graph for stage 1 and 2 in a 3x3 system. 


BIU RMU BIU 



Stage 1 Stage 2 


Figure 5.5: Detailed message flow graph for stages 1 and 2 in a 3x3 system 


5.3.1. Effective reception delay for process P2 

Let B P0 denote the send delay for process PO. We want to compute the effective reception delay for 
process P2. In general, this delay is measured from the time of some local event to the time of reception. 
We use T P0 , the local time of transmission of the INIT message in process PO, as the local reference event 
to measure the reception delay from process PO to process P2. Note that instead of the start of the 
protocol, we choose the send time for process PO as the reference time to measure the reception delay. 
This approach enables the analysis of the synchronization protocols independently of B P0 . B P0 is 
computed based on a single-stage point-to-point synchronous communication model (see Section 5.9.2. 1). 

We need to determine the earliest and latest real times of reception for process P2. Let B P1 denote the 
send delay for process Pl. t P 2 ,Rcv,i and t P 2 ,Rcv,h denote the earliest and latest real times, respectively, at 
which an INIT message from a trustworthy RMU node can be received in process P2 at the trustworthy 
BIUs. 

tp 2 ,Rcv,i = t P i,A,i + B P i/(l + po) + r PPj i = t P o,i + 2r PP ,i + (A P i + B P i)/(l + po) (5.15) 

tp 2 ,Rcv,h = tpi.A.h + (1 + po)Bpi + fpp.h = tpo.i + ftpo + 2i PPh + (1 + p 0 )(A P1 + B P1 ) (5.16) 

rpo-P2,i denotes the minimum effective message -reception delay for INIT messages in process P2 and is 
measured from the latest time at which the trustworthy BIU nodes can send INIT to the earliest time at 
which the BIU nodes can receive INIT messages from the trustworthy RMU nodes. 
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r P0-P2,i - tp2,Rcv,i - tpo.h - 2r PPj i + (A P1 + B P1 )/(1 + p 0 ) - 7t P0 


(5.17) 


fpo-P 2 ,h denotes the maximum effective message -reception delay for INIT messages in process P2 and is 
measured from the earliest time at which the trustworthy BIU nodes can send INIT to the latest time at 
which the BIU nodes can receive INIT messages from the trustworthy RMU nodes. 

r P 0 -P 2 ,h = tp2,Rcv,h " tpo.i = ^po + 2r PP _h + (1 + po)(A P1 + B P1 ) (5.18) 

The expected reception delay for process P2 is: 

Rpo-p 2 = IMP(r P o-P2,i , r P o_P2,h) (5-19) 

The total effective uncertainty in the real time of reception of the INIT messages in process P2 is: 

r P0-P2,h - r Po-p2,i — 27t P o + 2e PP + [(1 + po) - 1/(1 + po)](A P1 + B P1 ) (5.20) 


5.3.2. Expected time of reception for process P2 

The BIU nodes expect to receive INIT messages at local time T p2 ,rcv,e- 

T P2.RCV.E = Tpo + Rp0-P2 (5-21) 

The real-time error for T p2 ,rcv,e is bounded as follows. A BIU node will receive an INIT message from a 
trustworthy RMU node no earlier than p P0 . P 2j nominal ticks from T p2 ,rcv,e- 

ftp0-P2,l = (1 + P 0 )RP0 -P2 - r P0-P2,i (5.22) 

A BIU node will receive an INIT message from a trustworthy RMU node no later than p P0 _P 2 ,h nominal 
ticks from T p2 ,rcv,e- 


ftpO-P2,h - r P0-P2,h - Rp0-P2/(1 + Po) (5.23) 

Let T p2 ,rcv denote the actual local time at a BIU node when an INIT message from a trustworthy RMU 
node is received. In addition, let A p2 ,rcv denote the local-time error in T p2 ,rcv- 

Ap2,RCV = T P 2 ,rcv - T P 2,rcv,E (5.24) 

We want to determine a bound for the local-time error in the actual time of reception in process P2, 
denoted by A P2 , RC vlmax. 

ITp2,RCV - T P 2 ,RCV,eI ^ A P 2 ) RCvlmax (5.25) 

Ap 2 ,Rcvlmax is derived as follows. We know that the difference between the expected and the actual time of 
reception at a BIU node for INIT messages from the trustworthy RMU nodes is bounded by (lpo-p 2 ,i and 
!tpo-P2.h, such that: 


CrCv(Tp2,RCV,e) - P-P0-P2.1 ^ Crcv(Tp2,RCv) ^ Crcv(Tp2,RCV,e) + PP0-P2,h 


(5.26) 
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So: 


I c rcv(Tp2,rcv) - c rcv(Tp2,rcv,e)I - max(|Xpo-P2,i , P-po-P2,h) (5.27) 

From the constraint that the local clocks be p-bounded and (5.24): 


IAp2,RCvl/(l + po) ^ ICrcv(Tp 2 ,RCv) “ Crcv(T P 2 ,RCV,e)I (5.28) 

Combining (5.27) and (5.28): 

IAp2,rcvI ^ (1 + po)max(p P o-P2,i , p.po-P2,h) (5-29) 

Since A p2 ,rcv is an integer: 

IAp2,rcvI - L(1 + Po) ma x(p P o-P2,i . Ppo-P2,h)J (5.30) 

Therefore: 

Ap2,RCvlmax = L(1 + Po) m ax(p P o-P 2 ,i , p.po-P 2 .h)J (5.31) 


5.3.3. Bound on the observed relative skew of received messages for process P2 

Let rip 2 ,Rcv denote the bound on the relative skew observed in process P2 for the received messages 
from process PI at trustworthy RMUs. n P2 ,Rcv is measured in local clock ticks. n P2i Rcv is used to check 
for agreement among the received inputs and also to check agreement with the result of the Accept 
output. 

The worst case relative skew for received messages occurs when there are asymmetric eligible voters 
in process PI at the trustworthy RMU nodes. The bound on the relative skew of the Accept outputs in 
process PI is 7t P i jA - The additional uncertainty in the reception delay measured from the time of the 
Accept outputs to the time of reception in process P2 is e PP + [(l+p 0 ) - l/(l+p 0 )]B P1 . 

n P2 ,R C v — L(1 + Po)(tp2,RCV,h ' tp2,RCV,l)J 

= L(1 + Po){ftpi,A+ EPP + [(1 + Po) - 1/(1 + Po)]Bpi}J 
n P2 ,R C v — L(1 + po){7tpo + 2e PP + [(1 + po) - 1/(1 + po)](A P1 + B P1 ) } J (5.32) 

5.3.4. Relative skew of the Accept outputs for process P2 

Let AEV_P2 denote the set of asymmetric RMU eligible voters in process P2 at a trustworthy BIU 
node. Let Jt F2 . A denote the bound on the real-time relative skew of the Accept outputs in process P2 at 
trustworthy BIUs. A P2 denotes the delay (in local-clock ticks) of the Computation Process in process P2 
measured from the local time of reception of the selected message to the local time when the Accept 
output is asserted. If IAEV_P1I = 0 for each trustworthy RMU node, the trustworthy BIU nodes may have 
asymmetric RMU nodes in their sets of eligible voters for process P2 (i.e., IAEV_P2I ^ 0 for some 
trustworthy BIUs). In this case, the trustworthy BIU nodes accept within the time range delimited by 
messages from trustworthy RMU nodes. 
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TCp2,aIiaev_p2i*o - ji pi,aIiaev_pii = 0 + £pp + 1(1 + Po) - 1/(1 + Po)](Bpi + Ap 2 ) 

= 2e PP + [(1 + po) - 1/(1 + po)](A P1 + B P1 + A p2 ) (5.33) 

If IAEV_P1I 0 for some trustworthy RMU nodes, the trustworthy BIU nodes do not have asymmetric 
RMU nodes in their sets of eligible voters for process P2 (i.e., IAEV_P2I = 0 at each trustworthy BIU). In 
this case, the BIU nodes essentially accept on the same message. 

%2,aIiaev_p2i = o = e pp + [(1 + Po) - 1/(1 + Po)]A p2 (5.34) 

From this point on, unless otherwise stated: 

ftp2,A = ftp2,Almax = 7tp2,Al|AEV_P2l * 0 (5.35) 

5.4. Stage 3: P2 to P3 

Figure 5.6 illustrates the detailed message flow graph up to stage 3 for a 3x3 system. 


BIU RMU BIU RMU 



Stage 1 Stage 2 Stage 3 

Figure 5.6: Detailed message flow graph for stages 1 through 3 in a 3x3 system 

5.4.1. Effective reception delay for process P3 

We need to determine the earliest and latest real times of reception of ECHO messages in process P3. 
Let T P1 A i denote the local time at RMU node i when it asserts the output of its Accept function in process 
PI. Let t P1A j and t FLAh denote the earliest and latest real times, respectively, at which the trustworthy 
RMUs can assert the Accept outputs in process PI. 

If PI, A = tpi,A,h - tpi.A.l ( 5 . 36 ) 

Let B p2 denote the send delay for process P2. t P3 , RC v,i and t F3 RCVi h denote the earliest and latest real times, 
respectively, at which ECHO messages from trustworthy BIU nodes can be received by an RMU node in 
process P3. 

tp 3 ,Rcv,i = tpi.A.i + 2r PP ,i + (B P i + A p2 + B p2 )/(1 + po) (5.37) 

tp3,RCv,h = tpi.A.h + 2r PP ,h + (1 + po)(B P i + A P2 + B p2 ) (5.38) 
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r P |.p 3 .i denotes the minimum effective message -reception delay for ECHO messages in process P3 and is 
measured from the latest time at which trustworthy RMU nodes can assert their Accept(INIT) output to 
the earliest time at which the RMU nodes receive ECHO messages from the trustworthy BIU nodes. 

rpi-p3,i = tp3,Rcv,i - tpi.A.h = 2r PP ,i + (B P i + A P 2 + B P 2)/(1 + po) - Jtpi, A (5.39) 

rpi-P 3 ,h denotes the maximum effective message -reception delay for ECHO messages in process P3 and is 
measured from the earliest time at which the trustworthy RMU nodes can assert its Accept(lNIT) output 
to the latest time at which the RMU nodes can receive ECHO messages from the trustworthy BIU nodes. 

fpi-P3.h = tp3,RCV,h - tpi.A.l = 7tpi,A + 2r PP ,h + (1 + po)(B P i + A P 2 + B P 2 ) (5.40) 

The expected reception delay for process P3 is: 

Rpi-p3 - IMP(rpi.p 3 ,i , r P1 _p 33 1 ) (5.41) 

The effective uncertainty in the real time of reception of the ECHO messages in process P3 is: 

r Pi-P3.h - r Pi-P3,i — 2Jtpi, A + 2e PP + [(1 + po) - 1/(1 + po)](B P1 + A P2 + B p2 ) (5-42) 

5.4.2. Expected time of reception for process P3 

RMU node i expects to receive ECHO messages at local time T P 3 , RC v,E,i- 

Tp3,Rcv,E,i = Tpi Ai + Rpi-p3 (5.43) 

The real-time error for T P3 , R cv,E,i is bounded as follows. RMU node i will receive an ECHO message from 
a trustworthy BIU node no earlier than Ppi-p 3 ,i nominal ticks from T P 3 jRC v,E,i- 

ftpi-P3,l = (1 + P())Rpi-P3 - fpi-P3,l (5.44) 

RMU node i will receive an ECHO message from a trustworthy BIU node no later than p P i_p 3 , h nominal 
ticks from Tp 3jRC v,E,i. 

ftpi-P3,h — rppp3,h - Rpi- P3 /(1 + po) (5.45) 

We want to determine the maximum local-time error for the actual time of reception at the RMU nodes, 
denoted by Ap 3jRC vlmax- 

ITp3, R CV - T P 3, R cv,eI ^ Ap3 iR cvlmax (5.46) 

Following the analysis for process P2: 

Ap3,RCvlmax = L(1 + Po) max (Ppi-P3,l • Ppi-P3,h)J (5-47) 

5.4.3. Bound on the observed relative skew of received messages for process P3 

Let n P3jRC v denote the bound on the relative skew observed in process P3 for the received messages 
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from trustworthy sources in process P2. n P3RCV is measured in local clock ticks. n P3RCV is used to check 
for agreement among the received inputs and also to check agreement with the result of the Accept 
output. 

The worst case relative skew for received messages occurs when there are asymmetric eligible voters 
in process P2 at the trustworthy BIU nodes. The bound on the relative skew of the Accept outputs in 
process P2 is 7t P2iA . The additional uncertainty in the reception delay measured from the time of the 
Accept outputs in process P2 to the time of reception in process P3 is e PP + [(l+p 0 ) - l/(l+p 0 )]B P2 . 

n P3 , RC V = L(1 + Po)(tp3,RCV,h - t P3 , R cv,l)J 

= L(1 + Po){ftp 2 ,A + 6PP + [(1 + Po) ' 1/(1 + Po)]B P2 }J 

— L(1 + po){3e PP + [(1 + po) - 1/(1 + po)](A P1 + B P1 + A P2 + B P2 )}J (5.48) 


5.4.4. Relative skew of the Accept outputs for process P3 

Let AEV_P3 denote the set of asymmetric BIU eligible voters in process P3 at a trustworthy RMU 
node. Let 7 t P3 A denote the bound on the real-time relative skew of the Accept outputs in process P3 at the 
trustworthy RMUs. A P3 denotes the delay (in local-clock ticks) of the Computation Process in process P3 
measured from the local time of reception of the selected message to the local time when the Accept 
output is asserted. If IAEV_P2I = 0 for each trustworthy BIU node, the trustworthy RMU nodes may have 
asymmetric BIU nodes in their sets of eligible voters for process P3 (i.e., IAEV_P3I ^ 0 for some 
trustworthy RMU nodes). In this case, the trustworthy RMU nodes accept within the time range 
delimited by messages from trustworthy BIU nodes. 

ftp3,Al|AEV_P3l *0 = ftp2,Al|AEV_P2l = 0 + 6pp + [(1 + Po) " 1/(1 + Po)](Bp2 + A P3 ) 

= 2e PP + [(1 + po) - 1/(1 + Po)](A p2 + B p2 + Ap 3 ) (5.49) 

If IAEV_P2I 7 ^ 0 for some trustworthy BIU nodes, the trustworthy RMU nodes do not have asymmetric 
BIU nodes in their sets of eligible voters for process P3 (i.e., IAEV_P3I = 0 for each trustworthy RMU 
node). In this case, the RMU nodes essentially accept on the same message. 

TCp3,aIiaev_p3I = o = £pp + [( 1 + po) - 1/(1 + po)]A P 3 (5.50) 

From this point on, unless otherwise stated: 

7tp3,A — ftp3,Almax = %3,aI|AEV_P3I #0 (5.5 1) 


5.5. Stage 4: P3 to P4 

Figure 5.7 illustrates the detailed message flow graph up to stage 4 for a 3x3 system. 
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Figure 5.7: Detailed message flow graph for stages 1 through 4 in a 3x3 system 


5.5.1. Effective reception delay for process P4 

We need to determine the earliest and latest real time of reception of ECHO messages from 
trustworthy RMU nodes by the BIU nodes in process P4. Let T P2jA> j denote the local time at which 
trustworthy BIU node j asserts the output of its Accept function for process P2. t P2 ,A,i and t P2 , A , h denote 
the earliest and latest real times, respectively, at which the trustworthy BIUs can assert their Accept 
outputs in process P2. 

ftp2,A = t p2jA ,h - t P2 , Aj i (5.52) 

Let B P3 denote the send delay for process P3. t P4 , RC v,i denotes the earliest real time at which ECHO 
messages from trustworthy RMU nodes can be received by a BIU node in process P4. 

tp 4 ,Rcv,i = tp2,A,i + 2r PP j + (B P2 + A p3 + B p3 )/(1 + po) (5.53) 

tp4,Rcv,h denotes the latest real time at which ECHO messages from trustworthy RMU nodes can be 
received by a BIU node in process P4. 

tp4,Rcv,h = tp 2 ,A,h + 2r PP3l + (1 + Po)(B p2 + A P3 + B p3 ) (5.54) 

rp2-P4,i denotes the minimum effective message -reception delay for ECHO messages in process P4 and is 
measured from the latest time at which the trustworthy BIU nodes can assert their Accept(ECHO) outputs 
to the earliest time at which the BIU nodes can receive ECHO messages from the trustworthy RMU 
nodes. 


r P2-P4,l - tp4,RCV,l - tp2,A,h - 2r PP j + (B P2 + A p3 + B P3 )/(1 + Po) - 7 I p2 , a (5.55) 

fp 2 -P 4 .h denotes the maximum effective message -reception delay for ECHO messages in process P4 and is 
measured from the earliest time at which the trustworthy BIU nodes assert their Accept(ECHO) outputs to 
the latest time at which the BIU nodes can receive ECHO messages from the trustworthy RMU nodes. 

r P 2 -P 4 ,h = tp4,Rcv,h " t P2 , Aj i = tt P2jA + 2r PP3l + (1 + po)(B P2 + A P3 + B p3 ) (5.56) 

The expected reception delay for process P4 is: 
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Rp2-P4 — IMP(r P 2-P4,i , r P2 .p4,h) 


( 5 . 57 ) 


The effective uncertainty in the real time of reception of the ECHO messages in process P4 is: 

fp2-P4,h - r P 2-p4,i = 27t P 2,A + 2e PP + [(1 + po) - 1/(1 + Po)](B p2 + A P 3 + B p3 ) (5.58) 


5.5.2. Expected time of reception for process P4 

BIU node j expect to receive ECHO messages at local time T P4jRCV> E,j: 

T P 4,RCV,E,j = Tp2.A,j + Rp2-P4 (5.59) 

The real-time error for T p 4 ,rcv,e,j is bounded as follows. BIU node j will receive an ECHO message from a 
trustworthy RMU node no earlier than p P2 . P 4 j nominal ticks from T p4 RC v i h, i : 

M-P2-P4.1 = (1 + Po)Rp2-P4 - r P 2-P4,l (5.60) 

BIU node j will receive an ECHO message from a trustworthy RMU node no later than p P2 -p 4 .h nominal 
ticks from T P4jRCV ,E,j: 

P-P2-P4.h — r P2-P4,h - Rp2-P4/(1 + Po) (5.61) 

We want to determine the maximum local-time error for the actual time of reception at the BIU nodes in 
process P4, denoted by A P4jR cvlmax- 

ITp4,RCV - T P 4,rcv,eI - Ap4, R cvlmax (5.62) 

Following the analysis for process P2: 

Ap4, R Cvlmax - L(1 + po)max(p,p2-P4,i . flp2-P4,h)J (5.63) 

5.5.3. Bound on the observed relative skew of received messages for process P4 

Let n P4 , R cv denote the bound on the relative skew observed in process P4 for the received messages 
from process P3 at trustworthy RMUs. n P4 RCV is measured in local clock ticks. n P4RC v is used to check 
for agreement among the received inputs and also to check agreement with the result of the Accept 
output. 

The worst case relative skew for received messages occurs when there are asymmetric eligible voters 
in process P3 at trustworthy RMU nodes. The bound on the relative skew of the Accept outputs in 
process P3 at the trustworthy RMUs is 7X p 3 A - The additional uncertainty in the reception delay measured 
from the time of the Accept outputs in process P3 to the time of reception in process P4 is e PP + [(1 + p 0 ) - 
1/(1 + Po)]B p3 . 

n P4 , R cv — L(1 + Po)(tp4,RCV,h ' tp4,RCV,l)J 

= L(1 + Po){ftp3,A + 6PP + [(1 + Po) ' 1/(1 + Po)]B P3 }J 
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— L(1 + po){3e PP + [(1 + po) - 1/(1 + Po)](A P 2 + B P2 + A P3 + B P3 )}J 


(5.64) 


5.5.4. Relative skew of the Accept outputs for process P4 

Let AEV_P4 denote the set of asymmetric RMU eligible voters in process P4 at a trustworthy BIU 
node. Let 7t F4 A denote the bound on the real-time relative skew of the Accept outputs in process P4 at the 
trustworthy BIUs. A P4 denotes the delay (in local-clock ticks) of the Computation Process in process P4 
measured from the local time of reception of the selected message to the local time when the Accept 
output is asserted. If IAEV_P3I = 0 for each trustworthy RMU node, the trustworthy BIU nodes may have 
asymmetric RMU nodes in their sets of eligible voters for process P4 (i.e., IAEV_P4I 0 for some 
trustworthy BIU nodes). In this case, the BIU nodes accept within the time range delimited by messages 
from trustworthy RMU nodes. 

Hp4,aI|AEV_P4I*0 = TCp3,aI|AEV_P 3I = 0 + e PP + [(1 + po) - 1/(1 + Po)](Bp3 + A p4 ) 

= 2e PP + [(1 + po) - 1/(1 + po)](Ap 3 + B p3 + A p4 ) (5.65) 

If IAEV_P3I 0 for some trustworthy RMU nodes, the trustworthy BIU nodes do not have asymmetric 
RMU nodes in their sets of eligible voters for process P4 (i.e., IAEV_P4I = 0 for each trustworthy BIU 
node). In this case, the BIU nodes essentially accept on the same message. 

7tp 4 ,Al|AEV_P4l = 0 = e PP + [(1 + po) - 1/(1 + Po)]Ap 4 (5.66) 

From this point on, unless otherwise stated: 

ftpr.A — 7tp4,Almax = tt P 4,Al|AEV_P4l # 0 (5.67) 

5.6. Synchronization capture 

Figure 5.8 illustrates the detailed message flow graph for the synchronization-capture stages in a 3x3 
system. The nodes executing processes P3C and P4C are called recovering nodes. 


BIU RMU RMU BIU 



Stage 3 Stage 4 


Figure 5.8: Detailed message flow graph for synchronization-capture stages in a 3x3 system 
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5.6.1. Bound on the observed relative skew of received messages for process P3C 

Let IIp 3 c,rcv denote the bound on the relative skew observed in process P3C for the received 
messages from process P2 at trustworthy BIUs. n P3 c,Rcv is measured in local clock ticks. n P3 c,Rcv is 
used to check for agreement among the received inputs and also to check agreement with the result of the 
Accept output. 

The nodes in process P3C receive ECHO messages from process P2 at trustworthy BIUs in the same 
real time range as nodes executing process P3. Therefore: 

n P 3c,Rcv = n P3 , RC v (5.68) 

5.6.2. Relative skew of the Accept outputs for process P3C 

Recovering RMU nodes synchronize using the ECHO messages from process P2 of the 
Synchronization Preservation protocol. Because the recovering nodes may have asymmetric faulty nodes 
in their sets of eligible voters, all we know is that they will accept within the time range delimited by 
ECHO messages from trustworthy BIU nodes. Let Jt P3CA denote the bound on the real-time relative skew 
of the Accept outputs in process P3C at the good recovering RMUs. Let A P3C denote the delay (in local- 
clock ticks) of the Computation Process in process P3C measured from the local time of reception of the 
selected message to the local time when the Accept output is asserted. We assume that the delay of the 
Computation Process in process P3C is the same as in process P3. 

A P 3c = A P 3 (5.69) 

The worst-case real-time relative skew occurs when the trustworthy BIU nodes and the recovering RMU 
nodes simultaneously have asymmetric nodes in their sets of eligible voters. For that case: 

ftp3C,A = tt P 2 ,A + EPP + [(1 + Po) - 1/(1 + Po)](B P 2 + A P 3c) 

= 3e PP + [(1 + po) - 1/(1 + po)](A P1 + B P1 + A p2 + B P2 + A P3 ) (5.70) 


5.6.3. Bound on the observed relative skew of received messages for process P4C 

Let n P4C ,R C v denote the bound on the maximum relative skew observed in process P4C for the 
received messages from process P3 at trustworthy RMUs. n P4C RCV is measured in local clock ticks. 
n P4C ,Rcv is used to check for agreement among the received inputs and also to check agreement with the 
result of the Accept output. 

The nodes executing process P4C receive ECHO messages from process P3 at trustworthy RMUs in 
the same time range as nodes executing process P4. Therefore: 

n P 4 C ,R C v — n P4 , RCV (5.71) 

5.6.4. Relative skew of the Accept outputs for process P4C 

Recovering BIU nodes synchronize using the ECHO messages from process P3 of the 
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Synchronization Preservation protocol. Because the recovering BIUs may have asymmetric faulty nodes 
in their sets of eligible voters, all we know is that they will accept within the time range delimited by 
ECHO messages from trustworthy RMU nodes. Let 7t P4C ,A denote the bound on the real-time relative 
skew of the Accept outputs in process P4C at the good recovering BIUs. Let A P4C denote the delay (in 
local-clock ticks) of the Computation Process in process P4C measured from the local time of reception 
of the selected message to the local time when the Accept output is asserted. We assume that the delay of 
the Computation Process in process P4C is the same as in process P4. 

A p4C = A p4 (5.72) 

The worst-case real-time relative skew occurs when the trustworthy RMU nodes and the recovering BIU 
nodes simultaneously have asymmetric nodes in their sets of eligible voters. Lor that case: 

7tp4C,A = ftp3,A + e PP + [(1 + Po) ■ 1/(1 + Po)](Bp 3 + A P4C ) 


- 3e PP + [(1 + po) - 1/(1 + po)](A P 2 + B P 2 + A P 3 + B P 3 + A P4 ) 


(5.73) 


5.7. Resetting the local time 

5.7.1. Relative skew of the local-time reset for process P4 

Let T P4 A j denote the local time of the Accept output in process P4 at trustworthy BIU j. H P4 denotes 
the synchronization-reset delay applied by the BIU nodes resetting with respect to the Accept output in 
process P4. T P4 H j denotes the local time at which the next cycle begins for BIU node j synchronizing 
with respect to process P4. 

T P4,H,j = T P4Aj + H p4 (5.74) 

7t P4 H denotes the bound on the relative skew of the local-time reset for BIU nodes synchronizing with 
respect to process P4. Then: 

7t P4,H — ftpr.A + [(1 + Po) ' 1/(1 + Po)]H p4 


- 2e PP + [(1 + po) - 1/(1 + po)] (A P 3 + B P 3 + A P4 + H p4 ) 


(5.75) 


5.7.2. Relative skew of the local-time reset for process P4C 

Let T P4C ,A,j denote the local time of the Accept output in process P4C at a good recovering BIU j. H P4C 
denotes the synchronization-reset delay applied by the nodes resetting with respect to the Accept output in 
process P4C at the good recovering BIUs. T P4C H ,j denotes the local time at which the next cycle begins 
for BIU node j synchronizing with respect to process P4C. The BIU nodes executing process P4C apply 
the same synchronization-reset delay as the nodes executing process P4. 

H P4C = H p4 (5.76) 


So: 


56 



Tp4C,H,j - Tp4C,A,j + H p4c — T P4c ,A,j 


(5.77) 


+ H P4 

The bound on the relative skew of the Accept output for process P4C at the good recovering BIUs is 
given by 71 p4c,a- TCp 4 c,h denotes the bound on the relative skew of the local-time reset for good recovering 
BIU nodes synchronizing with respect to process P4C. Then: 

7t P4C,H — ^P4C,A + [(1 + P()) ' 1/(1 + Po)]H p4 

= 3e PP + [(1 + po) - 1/(1 + po)](Ap 2 + Bp 2 +Ap 3 + B P 3 + A P 4 + H P 4 ) (5.78) 


5.7.3. Reset delay for process P3 

Let Tp 3Ai denote the local time of the Accept output in process P3 at trustworthy RMU i. H P3 denotes 
the synchronization-reset delay applied by the nodes resetting with respect to the Accept output in process 
P3. Tp 3 , H ,i denotes the local time at which the next cycle begins for RMU i synchronizing with respect to 
process P3. 

Tp3,H,i = T p3,A,i + H p3 (5-79) 

H P3 is the expected delay from the time when the RMU nodes in process P3 assert their Accept output 
until the BIU nodes synchronizing with respect to process P4 reset their local-time clocks. The bound on 
the relative skew of the Accept outputs in process P3 at the trustworthy RMUs is given by 7t P 3, A . tp.i A .i and 
tp3,A,h denote the earliest and latest real times, respectively, at which the Accept outputs can be asserted in 
process P3 at the trustworthy RMUs. So: 

ftp3,A — tp3,A,h ' tp3,A,l (5.80) 

tp4,H,ilp3,A and t P4 ,H,hlp3,A denote the earliest and latest real times, respectively, at which a trustworthy BIU 
node synchronizing with respect to process P4 can reset its local-time clock. t F4 H |l P 3 A and t P4 , H ,hlp3,A are 
measured with respect to the Accept outputs in process P3 at the trustworthy RMUs. 

tp4,H,llp3,A — tp3,A,l + r PP,l + (B P3 + A P4 + H P4 )/(1 + p 0 ) (5.81) 

tp4,H,hlp3,A = tp3.A,h + fpp.h + (1 + po)(B P 3 + A P4 + H P 4 ) (5.82) 

Let h P3 j denote the minimum effective delay from the time the Accept output in process P3 at trustworthy 
RMUs is asserted to the time a trustworthy BIU node resets its local-time clock with respect to process 
P4. 


hp3,l - tp4,H,llp3,A - tp3, A ,h 


— Tpp,l + (B p3 + Ap4 + H p4 )/(1 + Po) - 7tp3,A (5.83) 

hp 3 h denotes the maximum effective delay from the time the Accept output in process P3 at trustworthy 
RMUs is asserted to the time a trustworthy BIU node resets its local-time clock with respect to process 
P4. 


hp3,h - tp4,H,hlp3,A - tp3,A,l 
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- 7tp3,A + r PP.h + (Bp3 + A p4 + H p4 )(1 + po) 


(5.84) 


H P3 is given by: 


H p3 - IMP(hp 3 ,i . h P3 , h ) (5.85) 

The real-time error for T P3 H | is bounded as follows. A tmstwoi'thy BIU node can reset its local-time 
clock with respect to process P4 no earlier than (Ip 3 ,h,i nominal ticks from local time Tp 3jH ,i at a trustworthy 
RMU node synchronizing with respect to process P3. 

Pp3,h,i = (1 + po)H P3 - hp 3 j (5.86) 

A trustworthy BIU node can reset its local-time clock with respect to process P4 no later than p P3Hh 
nominal ticks from local time T P3jHji at a tmstwoi'thy RMU node synchronizing with respect to process P3. 

Pp3,H.h = hp3,h - H p3 /(1 + po) (5.87) 

Note that this analysis also applies to the real-time error for T P3 H | with respect to the local-time reset of 
nodes synchronizing in process P4C. 


5.7.4. Relative skew of the local-time reset between processes P3, and P4 or P4C 

Let 7tp3_p 4H denote the bound on the relative skew of the local-time reset between RMU nodes 
synchronizing with respect to process P3 and BIU nodes synchronizing with respect to process P4. 

7tp3-P4,H = ma x(Pp3 iH ,i, Pp3,H,h) (5.88) 

7tp3.p4c,H denotes the bound on the relative skew of the local-time reset between RMU nodes 
synchronizing with respect to process P3 and BIU nodes synchronizing with respect to process P4C. Jtp 3 . 
p 4 ,h also applies here. 

ftp3-P4C,H = ftp3-P4,H (5.89) 


5.7.5. Relative skew of the local-time reset for process P3 

The bound on the relative skew of the Accept outputs in process P3 at trustworthy RMUs is given by 
7tp3,A- tt P 3 u denotes the bound on the relative skew of the local-time reset for trustworthy RMU nodes 
resetting with respect to the Accept output in process P3. 

ttp3,H = JtP3,A + [(1 + Po) - 1/(1 + Po)]H p3 


- 2e PP + [(1 + po) - 1/(1 + po)](Ap 2 + B P 2 + Ap 3 + PLu) 


(5.90) 


5.7.6. Relative skew of the local-time reset for process P3C 

Let Tp 3 c,A,i denote the local time of the Accept output in process P3C at good recovering RMU i. H P3C 
denotes the synchronization-reset delay applied by the RMU nodes resetting with respect to the Accept 
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output in process P3C. T P3C ,H,i denotes the local time at which the next cycle begins for RMU node i 
synchronizing with respect to process P3C. The RMU nodes executing process P3C apply the same 
synchronization-reset delay as the RMU nodes executing process P3. 

Hp3c = Hp 3 (5.91) 


So: 


Tp 3 C,H,i - Tp3c,A,i + H P3 c - Tp3c,A,i + H P3 (5.92) 

The bound on the relative skew for the Accept outputs in process P3C at the good recovering RMUs is 
given by 7t P3C ,A- TCp3c,h denotes the bound on the relative skew of the local-time reset for the nodes 
synchronizing with respect to the Accept output in process P3C. 

ftp3C,H = ftp3C,A + [(1 + Po) ' 1/(1 + Po)]H p3 

= 3e PP + [(1 + po)- 1/(1 + po)](Api + Bpi + A P 2 + B P 2 + A P3 + H P 3 ) (5.93) 


5.7.7. Reset delay for process P2 

Let T p 2 , A,k denote the local time of the Accept output in process P2 at trustworthy BIU k. H P2 denotes 
the synchronization-reset delay applied by the BIU nodes resetting with respect to the Accept output in 
process P2. Tp 2 , H ,k denotes the local time at which the next cycle begins for BIU node k synchronizing 
with respect to process P2. 

Tp2,H,k = Tp2,A,k + Hp2 (5.94) 

H p2 is the expected delay from the time when the BIU nodes executing process P2 assert their Accept 
outputs until the RMU nodes synchronizing with respect to process P3 reset their local-time clocks. 
tp 3 ,H,ilp 2 ,A denotes the earliest real time at which a trustworthy RMU node synchronizing with respect to 
process P3 can reset its local-time clock, measured with respect to the Accept outputs in process P2 at the 
trustworthy BIUs. 

tp3,H,llp2,A = tp2,A,l + Tpp.l + (B P 2 + A P3 + H P 3)/(1 + Po) (5.95) 

Let tp 3 H, h l P 2 ,A denote the latest real time at which a trustworthy RMU node synchronizing with respect to 
process P3 can reset its local-time clock, measured with respect to the Accept outputs in process P2 at the 
trustworthy BIUs. 

tp3,H,hlp2,A = tp2,A,h + fpp.h + (B P 2 + A P3 + H P 3)(1 + po) (5.96) 

Let h P2j i denote the minimum effective delay from the time a trustworthy BIU node in process P2 asserts 
its Accept output until a trustworthy RMU node in process P3 resets its local-time clock. 

hp2,l = tp3,H,|lp2,A - tp2,A,h 


- fpp,l + (Bp2 + Ap3 + H p3 )/(1 + po) - Jtp2,A 


(5.97) 
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Let h P2 ,h denote the maximum effective delay from the time a trustworthy BIU node in process P2 asserts 
its Accept output until a trustworthy RMU node in process P3 resets its local-time clock. 

hp2,h = tp3,H,hlp2,A - tp2,A,l 

= 7h>2,A + fpp.h + (Bp2 + Ap3 + H p3 )(1 + po) (5.98) 

H P2 is given by: 


H p2 - IMP(hp2j . h P2 , h ) (5.99) 

The real-time error for Tp 2 , H ,k is bounded as follows. A trustworthy RMU node in process P3 can reset its 
local-time clock no earlier than p P 2 .H,i nominal ticks from local time T P2 ,H,k at a BIU node synchronizing 
with respect to process P2. 

Pp2,h,i = (1 + Po)Hp 2 - hp2,i (5.100) 

A trustworthy RMU node in process P3 can reset its local-time clock no later than p P 2 ,H,h nominal ticks 
from local time T P2 ,H,k at a BIU node synchronizing with respect to process P2. 

Pp2,H.h - hp2,h - H P2 /(1 + po) (5.101) 

Note that this analysis also applies to the real-time error for T F2 .H. k with respect to the local-time reset of 
nodes synchronizing with respect to process P3C. 


5.7.8. Relative skew of the local-time reset between processes P2, and P3 or P3C 

Let ttp 2 -p 3 ,H denote the bound on the relative skew of the local-time reset between trustworthy BIU 
nodes synchronizing with respect to process P2 and trustworthy RMU nodes synchronizing with respect 
to process P3. 

ftp2-P3,H — max(p P2 ,H,i . Pp2,H,h) (5.102) 

ftp 2 -P 3 c,H denotes the bound on the relative skew of the local-time reset between trustworthy BIU nodes 
synchronized with respect to process P2 and good recovering RMU nodes synchronized with respect to 
process P3C. Ttpi.pr.n also applies here. 

ftp2-P3C,H = 7IP2-P3,H (5.103) 


5.7.9. Relative skew of the local-time reset for process P2 

The bound on the relative skew of the Accept outputs in process P2 at trustworthy BIUs is given by 
Jt P2> A. 7ip2,H denotes the bound on the relative skew of the local-time reset for trustworthy BIU nodes 
synchronizing with respect to process P2. Then: 

ftp 2 ,H = ftp 2 ,A + [(1 + Po) - 1/(1 + Po)]H p2 

- 2e PP + [(1 + po) - 1/(1 + p 0 )](A P1 + B P1 + A P2 + H P2 ) (5.104) 
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5.7.10. Relative skew of the local-time reset for a set including processes P2 and P3C 

Let tp 3 c,H,ilp 2 ,A denote the earliest real time at which a good recovering RMU node synchronizing with 
respect to process P3C can reset its local-time clock, measured with respect to the Accept outputs in 
process P2 at trustworthy BIUs. 

tp3C,H,llp2,A = tp2,A,l + r PP,l + (Bp2 + A P3 + H P3 )/(1 + p 0 ) (5.105) 

t P3 c,H,hlp2,A denotes the latest real time at which a good recovering RMU node synchronizing with respect 
to process P3C can reset its local-time clock, measured with respect to the Accept outputs in process P2 at 
trustworthy BIUs. 

tp3C,H,hlp2,A — t P 2,A,h + r P P,h + ( 1 + po)(B P2 + A P3 + H P3 ) (5. 106) 

tp 2 ,H,i denotes the earliest real time at which a trustworthy BIU node synchronizing with respect to process 
P2 can reset its local-time clock. 

tp2,H,l = t P 2,A,l + H P2 /(1 + po) (5.107) 

tp 2 ,H,h denotes the latest real time at which a trustworthy BIU node synchronizing with respect to process 
P2 can reset its local-time clock. 

tp 2 ,H,h = t P 2 ,A.h + H P2 (1 + po) (5.108) 

% 2 +p 3 c,h denotes the bound on the relative skew of the local-time reset for a node set including all the 
trustworthy or good recovering nodes synchronizing with respect to process P2 or P3C. 

ttp2+P3C,H = max(l t P3 c,H,hlp2,A " t P2 ,H,l I, I t P2jH ,h “ t P 3C,H,llp2,A I, ttp2,H. ttp3C,H) (5. 109) 


5.7.11. Relative skew of the local-time reset for a set including processes P2 and P3 

Let 7t P2+P3 ,H denote the bound on the relative skew of the local-time reset for a node set including all 
trustworthy BIU nodes synchronizing with respect to process P2 or P3. With respect to process P2, the 
Accept outputs in process P3 at the trustworthy RMUs and the Accept outputs in process P3C at the good 
recovering RMUs can be asserted during the same real time interval. In the presence of asymmetric 
faulty BIU nodes, we know that the time interval of the Accept outputs in process P3 at the trustworthy 
RMUs is contained within the time interval of the Accept outputs in process P3C at the good recovering 
RMUs. Therefore: 

ttp2+P3,H ^ ttp2+P3C,H (5.1 10) 


5.7.12. Relative skew of the local-time reset for a set including processes P2 and P4C 

Let t P4C ,H,ilp 2 ,A denote the earliest real time at which a good recovering BIU node synchronizing with 
respect to process P4C can reset its local-time clock, measured with respect to the Accept outputs in 
process P2 at the trustworthy BIUs. 

tp4C,H,llp2,A = tp 2 ,A,l + 2r PP ,i + (B P2 + A p3 + B p3 + A P 4 + H P 4)/(1 + po) (5.1 1 1) 
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tp 4 c,H,hlp 2 ,A denotes the latest real time at which a good recovering BIU node synchronizing with respect to 
process P4C can reset its local-time clock, measured with respect to the Accept outputs in process P2 at 
the trustworthy BIUs. 

tp4C,H,hlp2,A = tp2,A,h + 2r PP ,h + (1 + po)(B P 2 + A P 3 + B P 3 + A P 4 + H P /t) (5. 1 12) 

7tp2+P4c,H denotes the bound on the relative skew of the local-time reset for a node set including all BIU 
nodes synchronizing with respect to process P2 or P4C. 

ttp2+P4C,H = max(l t P 4c,H,hlp2,A " t P 2,H,l I, I t P 2,H,h - t P 4c,H,llp2,A I, tt P 2,H. TCp4C,h) (5. 113) 


5.7.13. Relative skew of the local-time reset for a set including processes P2 and P4 

Let 7t P2+P 4,H denote the bound on the relative skew of the local-time reset for a node set including all 
trustworthy BIU nodes synchronizing with respect to process P2 or P4. With respect to process P2, the 
Accept outputs in process P4 at the trustworthy BIUs and the Accept outputs in process P4C at the good 
recovering BIUs can be asserted during the same real time interval. In the presence of asymmetric faulty 
RMU nodes, we know that the time interval of the Accept outputs in process P4 at the trustworthy BIUs 
is contained within the time interval of the Accept outputs in process P4C at the good recovering BIUs. 
Therefore: 

ftp2+P4,H ^ ttp2+P4C,H (5.114) 


5.7.14. Relative skew of the local-time reset for a set including processes P3 or P3C 

Let 7t P 3 +P3 c,H denote the bound on the relative skew of the local-time reset for a node set including all 
trustworthy or good recovering RMU nodes synchronizing with respect to process P3 or P3C. With 
respect to process P2, the Accept outputs in process P3 at the trustworthy RMUs and the Accept outputs 
in process P3C at the good recovering RMUs can be asserted during the same real time interval. This 
interval is determined by the time range during which the trustworthy BIU nodes executing process P2 
send ECHO. In the presence of asymmetric faulty BIU nodes, the good recovering RMU nodes executing 
process P3C may not be able to synchronize any better than the duration of this time range. 

ttp3+P3C,H = niax(7t P 3,H, TCp3C,h) = 7tp3C,H (5.1 15) 


5.7.15. Relative skew of the local-time reset for a set including processes P3 and P4C 

Let t P4C ,H,ilp3,A denote the earliest real time at which a good recovering BIU node synchronizing with 
respect to process P4C can reset its local-time clock, measured with respect to the Accept outputs in 
process P3 at the trustworthy RMUs. 

tp4C,H,llp3,A = tp3,A,l + r PP j + (B P 3 + A P 4 + H P 4)/(1 + po) (5.1 16) 

tp 4 c,H,hlp 3 ,A denotes the latest real time at which a good recovering BIU node synchronizing with respect to 
process P4C can reset its local-time clock, measured with respect to the Accept outputs in process P3 at 
the trustworthy RMUs. 
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tp4C,H,hlp3,A — tp3,A,h + Tpp.h + (1 + Po)(B p3 + A p4 + H p4 ) 


(5.117) 


tp 3 ,H,ilp 3 ,A denotes the earliest real time at which a trustworthy RMU node synchronizing with respect to 
process P3 can reset its local-time clock, measured with respect to the Accept outputs in process P3 at the 
trustworthy RMUs. 

tp3,H,llp3,A — tp3,A,l + H p3 /(1 + po) (5.1 18) 

tp 3 ,H,hlp 3 ,A denotes the latest real time at which a trustworthy RMU node synchronizing with respect to 
process P3 can reset its local-time clock, measured with respect to the Accept outputs in process P3 at the 
trustworthy RMUs. 

tp3,H,hlp3,A = tp3,A,h + H P3 (1 + po) (5.1 19) 

Jtp 3 +p 4 c,H denotes the bound on the relative skew of the local-time reset for a node set including all 
trustworthy or good recovering nodes synchronizing with respect to process P3 or P4C. 

ftp3+P4C,H = max(l tp4c,H,hlp3,A " tp3,H,llp3.A I, I tp 3 ,H,hlp3,A " tp4C,H,llp3,A I, JtP3.H, TCp4C,h) (5.120) 


5.7.16. Relative skew of the local-time reset for a set including processes P3 and P4 

Let 7tp3 + p 4 ,H denote the bound on the relative skew of the local-time reset for a node set including all 
BIU nodes synchronizing with respect to process P3 or P4. With respect to process P3, the Accept 
outputs in process P4 at the trustworthy BIUs and the Accept outputs in process P4C at the good 
recovering BIUs can be asserted during the same real time interval. In the presence of asymmetric faulty 
RMU nodes, we know that the time interval of the Accept outputs in process P4 at the trustworthy BIU 
nodes is contained within the time interval of the Accept outputs in process P4C at the good recovering 
BIU nodes. Therefore: 

7tp3+P4,H ^ 7tp3+P4C,H (5.121) 


5.7.17. Relative skew of the local-time reset for a set including processes P3C and P4C 

Let tp 3 C ,H,ilp 2 ,A denote the earliest real time at which a good recovering RMU node synchronizing with 
respect to process P3C can reset its local-time clock, measured with respect to the Accept outputs in 
process P2 at the trustworthy BIUs. 

tp3C,H,llp2,A = tp2,A,l + Tpp.l + (B P2 + A P3 + H P3 )/(1 + p 0 ) (5.122) 

tp 3 c,H,hlp 2 ,A denotes the latest real time at which a good recovering RMU node synchronizing with respect 
to process P3C resets its local-time clock, measured with respect to the Accept outputs in process P2. 

tp 3 C,H,hlp 2 ,A — tp 2 ,A,h + r P p,h + (1 + po)(B P2 + A P3 + H P3 ) (5. 123) 

tp4c,H,ilp2,A denotes the earliest real time at which a good BIU node synchronizing with respect to process 
P4C can reset its local-time clock, measured with respect to the Accept outputs in process P2 at the 
trustworthy BIUs. 
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tp 4 C,H,llp 2 ,A - tp 2 ,A,l + 2r PP j + (B P2 + A P3 + B P3 + A p4 + H p4 )/(1 + po) 


(5.124) 


tp 4 c,H,hlp 2 ,A denotes the latest real time at which a good recovering BIU node synchronizing with respect to 
process P4C can reset its local-time clock, measured with respect to the Accept outputs in process P2 at 
the trustworthy BIUs. 

tp 4 c,H,hlp 2 ,A — tp 2 ,A,h + 2r PP4l + (1 + po)(B P 2 + A P3 + B P3 + A P4 + H P4 ) (5. 125) 

Jtp 3 c+P 4 c,H denotes the bound on the relative skew of the local-time reset for a node set including all good 
recovering nodes synchronizing with respect to process P3C or P4C. 

ftp3C+P4C,H = max(l t P4 c,H,hlp2,A - tp 3 c,H,llp2,A I, I tp3C,H,hlp2,A “ tp 4 c,H,llp2,A l ?tp 3 c,H> tt P 4C,H) (5.126) 


5.7.18. Relative skew of the local-time reset for a set including processes P4 and P4C 

Let 7tp 4+P4C ,H denote the bound on the relative skew of the local-time reset for a node set including all 
trustworthy or good recovering BIUs synchronizing with respect to process P4 or P4C. With respect to 
process P3, the Accept outputs in process P4 at the trustworthy BIUs and the Accept outputs in process 
P4C at the good recovering BIUs can be asserted during the same real time interval. This time range is 
determined by the time range during which the trustworthy RMU nodes executing process P3 send their 
ECHO messages. In the presence of asymmetric faulty RMU nodes, the good recovering BIUs executing 
process P4C may not able to synchronize any better than the duration of this time range. 

7tp 4+ p 4 c,H = niax(7tp 4 ,H , TCp4c,h) = ttprc.H (5.127) 


5.7.19. Relative skew of the local-time reset for a set including all the synchronizing nodes 

Let 71all,h denote the upper bound on the relative skew of the local-time reset for all the trustworthy or 
good recovering nodes executing the synchronization protocol. The following relations allow us to 
reduce the number of relative skews that must be considered: 


7tp2+P3C,H - 7tp2,H and 7tp2+p 3 c,H - 7tp3C,H 

(5.128) 

ftp2+P3C,H ^ ttp2+P3,H 

(5.129) 

ftp2+P4C,H ^ ttp2,H and 7tp2+p 4 C,H ^ ttp4C,H 

(5.130) 

7tp2+P4C,H — 7t P2+P4,H 

(5.131) 

ftp3+P4C,H ^ ttp3,H and 7tp 3+ P4C,H ^ ttp4C,H 

(5.132) 

7tp3+P4C,H ^ ttp3+P4,H 

(5.133) 

7tp3C+P4C,H - 7tp3C,H and 7tp 3 c+P4C,H - 7t P4C,H 

(5.134) 


So: 

TCaLL.H = max(7Ip2 + P3C,H> ttp2+P4C,H> ttp3+P4C,H. ^P3C+P4C,h) (5.135) 
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5.8. Relative local-time skews for source-receiver pairs 


5.8.1. Duration of the synchronization protocol execution 

From global perspective, the execution of the synchronization protocol ends when all the trustworthy 
and good recovering nodes have reset their local-time clocks. t syilc j and t sync h denote the earliest and latest 
times, respectively, at which a trustworthy BIU node begins to execute the synchronization protocol. Jt P0 
denotes the bound on the relative local-time skew for the trustworthy BIU nodes executing process PO. 

TIPO fsync, h " !sync, 1 (5.136) 

t sync ,P 2 ,H,i and t sync p 2 H .h denote the earliest and latest times, respectively, at which a trustworthy BIU node 
synchronizing with respect to process P2 can reset its local-time clock. 

lsync.P2,H.l !sync, 1 + 2r PP j + (Bpo + Api + Bpi + A P 2 + Hp2)/(1 + po) (5.137) 

lsync,P2,H,h lsync,h + 2r PPj h + (1 + po)(B P o + A P i + B P i + A P 2 + H P 2 ) (5.138) 

t S ync,P 3 ,H,i and t sync , P 3 , H ,h denote the earliest and latest times, respectively, at which a trustworthy RMU node 
synchronizing with respect to process P3 can reset its local-time clock. 

Isync,P3.H,l !sync, 1 + 3r PP j + (Bpo + Api + Bpi + A P 2 + B P 2 + A P 3 + Hp3)/(1 + po) (5.139) 

lsync,P3,H,h lsync,h + 3r PP h + (1 + po)(B P o + A P1 + B P1 + A P 2 + B P 2 + A P 3 + H P 3 ) (5.140) 

tsync,P 4 ,H,i and t sync , P4 ,H,h denote the earliest and latest times, respectively, at which a trustworthy BIU node 
synchronizing with respect to process P4 can reset its local-time clock. 

lsync.P4.H.l 1-sync, 1 + 4r PPj i 

+ (Bpo + A P1 + B P1 + A P 2 + B P 2 + A P 3 + B P 3 + A p4 + H p4 )/(1 + po) (5.141) 

Isync,P4,H,h tsync.h + 4r PPj h 

+ (1 + po)(Bpo + A P i + B P i + A P 2 + B p2 + A P 3 + B P 3 + A p4 + H p4 ) (5.142) 

t S ync,P3c,H,i and t sync , P 3c,H,h denote the earliest and latest times, respectively, at which a good recovering 
RMU node synchronizing with respect to process P3C can reset its local-time clock. 

tsync,P3C,H,l = tsync,P3,H,l (5.143) 

tsync,P3C,H,h = t S ync,P3,H,h (5.144) 

t sy nc,P 4 c,H,i and t sync P4C ,H,h denote the earliest and latest times, respectively, at which a good recovering BIU 
node synchronizing with respect to process P4C can reset its local-time clock. 

t S ync,P4C,H,l = t S ync,P4,H,l (5.145) 
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tsync.P4C.PPh tsync,P4,H,h 


(5.146) 


71 all denotes the bound on the relative local-time skew for all the nodes participating in the execution of 
the synchronization protocol. The calculation of 71 A ll does not include the nodes executing the 
synchronization-capture processes. 8 sync l min and 5 sync l max denote lower and upper bounds, respectively, on 
the real-time duration of the execution of the synchronization protocol for the trustworthy nodes. 5 sync l niin 
is measured from the latest time at which a trustworthy node begins to execute the protocol to the earliest 
time at which a trustworthy node resets its local-time clock. We choose the following value for 8 sync l inin : 

Ssyncl mm — “(TaPP " ttpo) "t" 1U 1 11 1 ( t s , nc.P2.l P " t synCi h), (tsync,P3,H.l " tsync.hX (tsync.P4.H,l " tsync.h)] (5.147) 

8 sync lmax is measured from the earliest time at which a trustworthy node begins to execute the protocol to 
the latest time at which a trustworthy node resets its local-time clock. We choose the following value for 

Ssyncl max* 


Ssyncl max (TaPP " ttpo) + lUax[ (t S ync,P2,H,h " tsync.l), (tsync,P3,H.h " t S ync,lX (tsync,P4,H,h " tsync.l)] (5.148) 

We define the following variables in order to simplify these expressions for 8 syn clmin and 5 sync l max . 

A sync ,p 2 ,H,i — 2r PPj i + (Bpo + A P1 + B P1 + A P2 +H P2 )/(1 + po) (5. 149) 

A sy nc.P3.H,i = 3r PPj i + (Bpo + Api + B P i + A P2 + B P2 + A P 3 +H P 3)/(1 + po) (5.150) 

A sync ,p4,H,i = 4r PPj i + (Bpo + Api + B P i + A P2 + B P2 + A P 3 + B P 3 + A P 4 +H P 4)/(1 + Po) (5.151) 

A syn c.p 2 ,H,h = 2r PP ,h + (1 + po)(B PO + Api + Bpi + A P2 +H P2 ) (5. 152) 

A sy nc.P3,H,h = 3r PP ,h + (1 + po)(Bpo + Api + Bpi + A P2 + B P2 + Ap 3 +Hp 3 ) (5.153) 

A sync ,p4.H,h — 4r P p_h + (1 + po)(B PO + A P1 + B P | + A P2 + B P2 + A P3 + B P3 + A P4 +H P4 ) (5.154) 

Then: 

dsynj min — “tt A LL ITU Pl( A sync p 2 | j | , A sync p3 H.l , A sync p4 pi,l) (5.155) 

dxynci max ^ALL Itt<^x(^sync,P2,H,h , A S y nc p3 ^H,h ? A S y nc p4j| ? h) (5-156) 


5.8.2. Bounds on the resynchronization period 

Let Ssplmin and 5 S plm ax denote the values of 5 sync l min and 5 sync l max , respectively, for the Synchronization 
Preservation protocol. T SP denotes the scheduled local time to begin the execution of the Synchronization 
Preservation protocol. p inin denotes a lower bound on the real-time duration of a synchronization cycle. 
Pmin is measured from the time of the synchronization reset in one cycle to the time of the synchronization 
reset in the next. 

Pmin = Tsp/(1 + po) + Ssplmin (5.157) 
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p max denotes an upper bound for the real-time duration of a synchronization cycle. p max is measured from 
the time of the synchronization reset in one cycle to the time of the synchronization reset in the next. 

Pmax = (1 + P())Tsp + 5spl max (5.158) 

P denotes the nominal resynchronization period for the analysis of relative skews. P is measured in units 
of local-clock ticks. We want a count of P local-clock ticks to be at least as large as the maximum 
duration of a synchronization cycle measured in nominal ticks. This constraint is captured by the 
following expression: 

P/(l + p 0 ) > p max (5.159) 


So: 


P > (1 + p 0 ) Pmax (5.160) 

We choose P to be the smallest integer that satisfies the previous inequality. 

P = f(l + Po)Pm ax l (5.161) 


5.8.3. Relative skew between P2-synchronized Bills and P3- or P3C-synchronized RMUs 

Let 7t P2 _P3 denote the bound on the relative local-time skew during the synchronization cycle for 
trustworthy BIU nodes synchronized with respect to process P2 and trustworthy RMU nodes 
synchronized with respect to process P3. 

7tp2-P3 - 7tp2-P3,H + [(1 + Po) - 1/(1 + Po)]P (5.162) 

7tp 2 -P3c denotes the bound on the relative local-time skew during the synchronization cycle for trustworthy 
BIU nodes synchronized with respect to process P2 and good recovering RMU nodes synchronized with 
respect to process P3C. 7t P2 .p3 also applies here. 


7tp2-P3C - ftp2-P3 


(5.163) 


5.8.4. Relative skew between P3-synchronized RMUs and P4- or P4C-synchronized BIUs 

Let 7tp3_ P4 denote the bound on the relative local-time skew during the synchronization cycle for 
trustworthy RMU nodes synchronized with respect to process P3 and trustworthy BIU nodes 
synchronized with respect to process P4. Then: 

ttp3-P4 = ttp3-P4,H + [(1 + Po) - 1/(1 + p 0 )]P (5.164) 

Jtp 3 .P 4 c denotes the bound on the relative local-time skew during the synchronization cycle for trustworthy 
RMU nodes synchronized with respect to process P3 and good recovering BIU nodes synchronized with 
respect to process P4C. 7t P 3_p 4 also applies here. 


ttp3-P4C - tt P 3_p4 


(5.165) 
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5.8.5. Bound on the relative local-time skew for all the nodes executing the synchronization 
protocol 

TCsp.all denotes the value of 71all for Synchronization Preservation. 

ttsP.ALL = ftALL.H + [(1 + po) - 1/(1 + po)]P (5.166) 


5.8.6. Generic relative local-time skew between sources and receivers for synchronous 
communication 

For synchronized operations, we would like to use a single value of the relative local-time skew 
between sources and receivers for all point-to-point communication. Jt PF SK denotes the common bound on 
the relative local-time skew between sources and receivers for synchronized communication. From the 
preceding analysis, there are only two particular source-receiver cases that need to be considered to 
determine a common skew bound: the skew between P2-synchronized nodes and P3-synchronized nodes 
(i.e., Jtp 2 -P 3 ), and the skew between P3 -synchronized nodes and P4-synchronized nodes (i.e., 7tp 3 _ P 4). We 
choose 7I pp ,sr to be the largest of the two. 


ftpP.SR - max(7tp2.p3 j ftp 3 -P4) 


(5.167) 


5.9. Specifying the Computation Process and Send Process delays 

A goal of this ROBUS version is to achieve nearly the same tightness for the relative local-time skew 
when executing the Synchronization Preservation, Initial Synchronization, and Synchronization Capture 
protocols. 

The Synchronization Preservation and Initial Synchronization protocols can be decomposed into two 
major phases: agreement generation and agreement propagation. The agreement generation phase 
includes the first two stages of the protocol from the Send Process in PO to the Computation Process in 
P2. In this phase, the relative skew goes from a bounded initial value denoted by 7t P0 to a relative skew of 
the Accept outputs denoted by 7t P2 .A, which is independent of 7t P o but dependent on the process delays. 
The agreement propagation phase includes the last two stages of the protocol from the Send Process in P2 
to the Computation Process in P4, including the Computation Processes in P3C and P4C for the 
Synchronization Capture protocol. The synchronization-reset delays are applied with respect to the 
Accept outputs in processes P2, P3, P3C, P4, and P4C. The process delays for this second phase of the 
protocol are important determinants of the final relative local-time skew. 

The approach taken to determine the process delays for the synchronization protocols in this version of 
the ROBUS is as follows. Since we expect the value of 7t P0 to be different for the Synchronization 
Preservation and the Initial Synchronization protocols, we specify the Send Process delay for process PO 
(i.e., Bp 0 ) independently for each protocol according to the particular timing requirements of the protocol. 
To ensure that all the versions of the protocol achieve approximately the same relative skew, we compute 
one set of Computation Process and Send Process delays for the synchronization processes from PI on. 
These delays must be used by all the synchronization protocols. 

An additional consideration is the constraint on the minimum data-introduction interval (DII) for the 
send port of the Communication Module, A Comm . This constraint applies to the BIUs and the RMUs, and 
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is satisfied by adding functional requirements to the Send Processes at the BIUs and the RMUs. The 
details are described next. 


5.9.1. Computation Process delays 

The Computation Process delay is decomposed into two parts: the reception delay in the Reception 
Process and the computation delay in the Accept Process. For the case of the Synchronization 
Preservation protocol, the reception delay is the delay allocated to ensure that all valid messages are 
received before the computation begins. This delay is similar to the deskewing window applied for the 
synchronous point-to-point communication. A fundamental difference between the reception delay for 
the synchronization protocols and the deskewing window for the synchronous protocols is that in the 
synchronization protocols the relative time spacing between received messages is preserved when 
forwarding the messages to the computation, while in the synchronous protocols the messages are 
accumulated and forwarded at the same time. 

To specify the reception delay, we consider the timing of reception in the Synchronization 
Preservation protocol. The timing of reception in the Initial Synchronization protocol is not considered 
because in that protocol the uncertainty in the time of reception can be extremely large, especially for 
processes PI and P2, which would result in very large delays for the protocol. Having a quick execution 
is very important for the Synchronization Preservation protocol since the duration of the protocol 
determines how much time is available to execute the synchronous protocols for a given 
resynchronization period. 

Let A ISP1 , A IS P2 , A is , P 3, and A 1S P4 denote the Computation Process delays for processes PI, P2, P3, and 
P4 of the Initial Synchronization protocol, respectively. A SPP1 , A SPP2 , A SPP3 , and A SPF4 denote the 
Computation Process delays for processes PI, P2, P3, and P4 of the Synchronization Preservation 
protocol, respectively. A S c,p3c and A S c,p4c denote the Computation Process delays for processes P3C and 
P4C of the Synchronization Capture protocol, respectively. All the synchronization protocols have the 
same Computation Process delays. 


Api = A IS-P1 = A SPjP1 

(5.168) 

A P2 — A IS-P2 = A SPjP2 

(5.169) 

Ap3 = Ais,P3 - Asp,p 3 — Asp,P3C 

(5.170) 

A p4 — A IS-P4 = a SP jP4 = a SPP4C 

(5.171) 


For process PI of the Synchronization Preservation protocol, the expected time range of reception is as 
follows (This interval includes all the clock edges at which valid messages can arrive.): 

[Tsp.PI.RCV.E - A PP ,RCvlabs-max> Tsp,pi,RCV,E + A PPj RCvlabs-max ] (5.172) 

W SPjP i denotes the reception delay applied in process PI. For the Synchronization Preservation protocol, 
this delay must be large enough to ensure that the Accept Process receives the messages after the clock 
edges during which valid messages are expected to arrive. 

Wsp,pi = 2A PPjR cvlabs-max + 1 (5.173) 


69 



W SPi p 2 , W S p,p 3 , and W SPjP4 are similarly defined. Let A S p,p2,rcvUx, A SR .p 3 , RC vU x , and A S p,p 4 ,R Cv l m ax denote 
the maximum valid local time error for the time of reception of synchronization messages in processes 
P2, P3, and P4 of the Synchronization Preservation protocol. These variables correspond to A P2 ,Rcvlmax, 
Ap 3 ,Rcvlmax> and A P4i Rcvlmax evaluated for the case of the Synchronization Preservation protocol. Then: 

W SP,P2 = 2Asp,P2,RCvlmax + 1 (5.174) 

W S p jP 3 = 2A S p j p3 j R Cv lmax + ' (5.175) 

W SP,p 4 = 2Asp,p 4 ,RCvlmax + 1 (5.176) 

Notice that for process PI the expected time of reception is exactly A SPP | RC vl nill x ticks from the left edge 
of the reception interval in that process. Similar observations apply to processes P2 through P4. The 
computation delay is measured from the time the message to be selected is presented to the Accept 
Process until the Accept output is asserted. Csp,pi, Csp,p2, Csp,p3, and Csp,p 4 denote the Accept Process 
delays for processes PI. P2, P3, and P4, respectively, of the Synchronization Preservation protocol. 
These delays also apply to the Initial Synchronization and Synchronization Capture protocols. Then: 


Api = W S p_ P1 + Csp.pi 

(5.177) 

A p2 = W SP ,p2 + Csp,P2 

(5.178) 

A P3 = Wsp,P3 + Csp,P3 

(5.179) 

A P4 = W S p,p 4 + Csp,p 4 

(5.180) 


5.9.2. Send Process delays 

The Send Process delays must be set to ensure proper inter-process communication. The Send Process 
delay for process PO does not need to be the same for Initial Synchronization and Synchronization 
Preservation. The specification of that value for each protocol is presented below. For all the other Send 
Processes, we specify the delays based on two factors. First, we would like to specify the process delays 
based on the execution of the Synchronization Preservation protocol. The timing of execution of the 
Initial Synchronization protocol is not preferred because in that protocol the uncertainty in the time of 
reception can be extremely large, which would result in extremely large process delays for the protocol. 
The second factor when specifying the Send Process delays is the need to satisfy the minimum data- 
introduction-interval constraint for the send port of the Communication Module, Ap n mm , which must be 
satisfied at the BIUs and the RMUs. 

For the execution of the Synchronization Preservation protocol, the main concern in specifying the 
Send Process delays is ensuring proper coordination between the send and receive operations. In 
particular, the specification of the send delay must take into consideration the expected reception delay, 
the minimum delays in opening the input windows, and the size of the input windows. This is not a 
consideration in the Initial Synchronization protocol since in that case all the Computation Processes are 
enabled at the beginning of the execution of the protocol. 

For the Synchronization Preservation protocol, we must ensure that the time separation between the 
sending of INIT and ECFIO messages satisfies the A Comm constraint. The preferred method to satisfy this 
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constraint in the Synchronization Preservation protocol is to increase the Send Process delays for the INIT 
messages in processes PO and PI and/or the ECHO messages in processes P2 and P3 until sufficient 
separation between them is ensured. 

For the Initial Synchronization protocol, the problem is more complicated. Because the initial relative 
local-time skew can be much larger than the Computation Process and Send Process delays, there is no 
way to meet the A Co mm constraint by simply changing the process delays while still achieving the other 
design goals. The preferred solution for this case is to add functionality to the Send Processes at the BIUs 
and the RMUs to force a minimum separation between INIT and ECHO messages. However, the 
buffering of synchronization messages for a bounded but unspecified amount of time at a Send Process is 
an undesired solution because it would result in an increase in the bound on the relative local-time skew 
achieved by the protocol. Instead, the solution is based on the observation that for the Initial 
Synchronization protocol, once the Computation Process of a node has performed the computation that 
triggers the sending of an ECHO message (i.e., Accept(INIT) in process P2 at the BIUs, and 
Accept(ECHO) in process P3 at the RMUs), there is no need for the node to send an INIT message. To 
understand this, notice that the synchronization protocol achieves synchronization in process P2, and this 
is then propagated to processes P3 and P4 using ECHO messages. For the Initial Synchronization 
protocol, RMUs and BIUs reset their local times with respect to the Accept(ECHO) outputs in processes 
P3 and P4, respectively. Therefore, the fact that an ECHO message is going to be sent means that 
whatever critical timing information was going to be provided by processes PO and PI, it has already been 
received. Therefore, the INIT messages are redundant from that point on. So, for Initial Synchronization, 
to meet the minimum data-introduction-interval constraint, the Send Process must have the following 
features: 

• The sending of an INIT message must be blocked if the message has not been sent by the time the 
Accept output that triggers the sending of an ECHO message is asserted. 

• The send delay for ECHO messages must be greater than or equal to Acomm - 1 ■ 

The first functional requirement removes redundant INIT messages. The second requirement ensures 
that, if an INIT message is sent at or before the tick at which the Accept output that triggers the sending of 
an ECHO message is asserted, then the ECHO message will be sent at least Ac 0 mm ticks after the INIT 
message. 

Let B ISP0 , B ISP1 , B IS p 2 , and B 1S ,P 3 denote the Send Process delays for processes PO, PI, P2, and P3 of 
the Initial Synchronization protocol, respectively. B SPi p 0 , B S p,pi, B S p,p2, and B SPj p 3 denote the Computation 
Process delay for processes PO, PI, P2, and P3 of the Synchronization Preservation protocol, respectively. 
For processes PI, P2, and P3: 


= Bis.pi = Bsp.pi 

(5.181) 

= Bis,P 2 - Bsp,P 2 

(5.182) 

— BlS,P3 - B SP ,P3 

(5.183) 


B is ,po and B S p,po are specified separately for Initial Synchronization and Synchronization Preservation. 
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5.9.2. 1. Send delay for process PO 


5.9.2. 1.1. Synchronization Preservation 

The Synchronization Preservation protocol is a time -triggered, event-driven protocol. The 
communication between processes PO and PI follows a time-triggered pattern similar to the point-to-point 
communication of the synchronous protocols. After that operation, the rest of the Synchronization 
Preservation protocol proceeds driven by communication and processing events. T S p denotes the local- 
time trigger for the execution of the Synchronization Preservation protocol. B SP P0 denotes the send delay 
for process PO of the Synchronization Preservation protocol. B SPPO l mm denotes the minimum send delay 
for process PO. B SP , P olmin is assumed to be the time needed to prepare the message for transmission. B SP , P o 
- Bsppnl min is additional delay added to align the send and receive operations. A sp , P i,rcvwnd denotes the 
delay from the communication reference time to the opening of the reception window in process PI. 
Asp,p i ,rc vwnd I min is the mi nimum value of A sp , P i,rcvwnd- Rpp denotes the expected point-to-point reception 
delay. W SP P1 is the size of the reception window. W SP P1 pre is the pre -expectation window (i.e., the size of 
the section of the reception window before the expected time of reception). Considering the analysis for 
point-to-point communication presented in a previous section, W SPjPl pre corresponds to W Deskew pre . So: 

W SPjPljpre = A pp _rc v I abs-max. (5.184) 

Tsp,po,snd denotes the send time for process PO. T SPjP o,snd corresponds to T P0 in the general analysis of 
the clock synchronization protocols. T spflrcv .h denotes the expected time of reception for process PI. 
Tsp.po-pi.rkk denotes the reference time for the transmission between PO and PI. 

Tsp,po-pi,ref = T SP (5.185) 

Two cases must be considered. 

Case 1: Bs P , P olmin + Rpp — AsP,Pl,RCVWNDlmin + Ws PjP i iPre 
For this case: 

BsP.PO — Bsp.polmin (5.186) 

Asp.pi.rcvwnd = Bsp.polmin + Rpp “ W SP , Plpre (5.187) 

So: 

TsP.P0.SND = Ts PjP 0 -P 1 ,REF + Bsp.PO = Ts P + Bsp.polmin (5.188) 

And: 

Tsp.pi.rcv.e = Ts P , P o,snd + Rpp — Ts P + B SP . PO l m i n + R PP (5.189) 


Case 2: Bs P , P olmin + Rpp < As P , P l,RCVWNDlmin + Ws P , P l, pre 
For this case: 
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(5.190) 


BsP,P0 — AsP.Pl.RCVWNDlmin + Wsp,pi, pre - Rpp 
Asp.PI.RCVWND — AsP,Pl,RCVWNDlmin (5.191) 

So: 

Tsp,P 0 ,SND = Tsp,P 0 -P 1 ,REF + BsP.PO = Tsp + Asp.pi.RCVWNDlmin + Wsp.pi.p re - Rpp (5.192) 

And: 

Tsp,P1,RCV,E = Tsp.po.SND + Rpp = Tsp + Asp.pi.RCVWNDlmin + Wsp.pi.pre (5.193) 


5. 9. 2. 1.2. Initial Synchronization 

Let Jt IS denote the bound on the relative local-time skew considering BIUs and RMUs during the 
execution of the Initial Synchronization protocol, measured in nominal clock ticks. Ti S denotes the local 
time triggering the execution of the Initial Synchronization protocol. The timing of the first-stage 
communication can be analyzed similarly to the point-to-point communication for synchronous protocols. 

Tis.po-pi.ref denotes the reference time for the communication between processes P0 and PI. T ISiP o,snd 
denotes the local time at which process P0 sends the message. T is ,po,snd corresponds to T P0 in the general 
analysis of the clock synchronization protocols. T is ,pi,rcv.e denotes the expected time of reception in 
process PI. B is ,po denotes the Send Process delay for process P0. Bismol,,,,,, denotes the minimum send 
delay for process P0. A 1s ,pi,rcvwnd denotes the delay from the communication reference time to the 
opening of the reception window in process PI. W ISP1 denotes the size of the reception window in 
process PI. W IS PljPre denotes the pre -expectation window in process PI (i.e., the size of the section of the 
reception window before the expected time of reception). We use T IS as the reference time for the 
communication between processes P0 and PI. 

Tis.po-pi.ref = Tis (5.194) 

We use the analysis for point-to-point communication to determine W ISPl pre . To determine W IS P1 , we 
need the maximum error in the expected time of reception for the Initial Synchronization protocol 

messages, A IS , PP , RCV labs -max* 

^IS ,PP,RC V I abs-max — L(1 + po)(7tis + niax(ppp,i , (Xpp,h))J (5.195) 

Ppp.i and p PP ,h are given in the Section 4. So, for the reception window: 

Wis.Pl = 2A ISiPP . RCV l a bs-max + 1 (5.196) 

Wis.pi.pre = Ais,pp, R Cvlabs-max (5.197) 

Bis.polmin is assumed to be the time needed to prepare the message for transmission. 

We expect the upper bound on the relative local-time skew during the execution of the first stage of 
the Initial Synchronization protocol to be much larger than any minimum timing constraints associated 
with the process of communication. Based on this, we assume that the following condition holds for the 
communication between processes P0 and PI. 
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(5.198) 


BlS.Polniin + Rpp < Ajs ,Pl,RCVWNDlmin + Wjs.pi.pre 

For this case: 

Bis ,P0 — Ajs.p 1 ,RC V WND I min + W IS ,Pi, pre - Rpp (5.199) 

Ais,P 1,RCVWND — A|S,P|,RCVWNDlmin (5.200) 

So: 


T,s ,P0,SND — Tis ,P0-P1,REF + Bis ,P0 - Tis + AlS.Pl.RCVWNDlmin + Wis,pi, pre - Rpp (5.201) 

And: 

TlS.Pl.RCV.E — Tis.P0.SND + Rpp — Tis + Ais.pi.RCVWNDimin + Wis.pi.pre (5.202) 

5.9.22. Send delay for process PI 

Bpi is specified based on timing considerations for Synchronization Preservation. B P1 l 1Ilin is determined 
by the implementation. Process PI sends INIT to process P2. However, the reference event used to 
coordinate the communication between processes PI and P2 is the trigger time for the transmission of the 
message in process P0. Let T sp , P o-p 2 ,ref denote this reference. 

Tsp,po-p2,ref = Tsp + Bsp.po (5.203) 

Rsp,po-p 2 denotes the expected reception delay for process P2 of the Synchronization Preservation protocol. 
Rsp,po-p 2 is measured from the send time in process P0 to the expected time of reception in process P2. 
Asp,p 2 ,rcvwnd denotes the delay from the reference time to the opening of the input window in process P2. 
Asp.p 2 .rcv wnd I min denotes the minimum value, which is determined by the implementation. For proper 
communication, the following relation must be satisfied: 

Rsp,P 0-P2 = Asp,P2,RCVWND + Asp,p2,RCvlmax (5.204) 

Here, both Rsp,po-p 2 and A S p,P 2 ,Rcvlmax are functions of B P i, and A S p,p 2 ,rcvwnd can be made larger than 
Asp,p 2 ,rcvwnd I min* Solving this equation for B P i is not trivial. However, note that Rsp,po-p 2 varies one-to- 
one with respect to B P i, while A SP ,P 2 .Rcvlmax changes by approximately 2p 0 B P i for each unit step in B P1 . 
This observation allows us to use the following algorithm to determine B Pi . The notation Rs P ,po-p 2 (B P i) 
and Asp,P 2 ,Rcvlmax(Bpi) highlights the dependence of R sp ,po-p 2 and A SP ,P 2 .Rcvlmax on B P i. 

1. B P1 = 

2. while [Rs P ,P0-P2(Bpi) < Asp,p 2 ,RCvlmax(Bpi) + A S p,p2,RCVWNDlmin] 

3. {B P1 = B P1 + 1 } 

4. Results: 

5. B P1 

6. Rsp,po-p2 = Rsp,po-p2(B P i) 

7 ■ A SP p 2 ,Rcvlmax = A S p,p2,RCvlmax(Bpi) 

8. A S P,P2,RCVWND = RsP,P 0-P2 - A S p,p2,RCvlmax 
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5. 9.2. 3. Send delay for process P2 


B P2 is specified based on timing considerations for Synchronization Preservation and the minimum 
data-introduction-interval constraint of the Communication Module. Bp 2 l m ,n is determined by the 
implementation. T S p,pi-p 3 ,ref denotes the reference time for the communication message propagation from 
PI to P3. The event used to coordinate this communication is the time of the Accept output in process 
PI. denoted by T S p,pi,a for the Synchronization Preservation protocol. 

Tsp,pi-p3,ref - Tsp.pi.a (5.205) 

Rsp,pi-p3 denotes the expected reception delay for process P3 of the Synchronization Preservation protocol. 
Rsp,pi-p3 is measured from the time of Accept output in process PI to the expected time of reception in 
process P3. A sp ,p3,rcvwnd denotes the delay from the reference time to the opening of the input window in 
process P3. A SP ,P 3 , R cvwNDlmin denotes the minimum value, which is determined by the implementation. 
For proper communication, the following relation must be satisfied: 

RsP,P1-P3 — Asp,P 3,RCVWND + ^SP,P3.RCvlmax (5.206) 

As for the case of B P1 , solving this equation for B P2 is non-trivial. Therefore, we use here the same 
algorithm used to solve for B P1 . An additional constraint is that B P2 must be greater than or equal to 

^Comm " 1 - 

1. B p2 = Bp 2 l m in 

2. if (B P2 < A Comm - 1), then Bp 2 — Ac omm - 1. 

2. while [Rsp,Pl-P3(Bp 2 ) < Asp,P3,RCvlmax(Bp2) + ^SP,P3,RCVWNDlmin] 

3. {B P2 = B P 2 + 1 } 

4. Results: 

5. B P 2 

6. RsP,P 1-P3 = RsP,Pl-P3(Bp 2 ) 

7 ■ A S p,p3,RCvlmax = As PjP 3,RCvlmax(Bp 2 ) 

8. A S P,P3,RCVWND = RsP,P 1-P3 “ Asp,p3,RCvlmax 


5. 9.2. 4. Send delay for process P3 

B P 3 is specified based on timing considerations for Synchronization Preservation and the minimum 
data-introduction-interval constraint of the Communication Module. B P3 l ni i n is determined by the 
implementation. T sp , p2 - P 4 ,ref denotes the reference time for the communication message propagation from 
P2 to P4. The event used to coordinate this communication is the time of the Accept output in process 
PI, denoted by T sp , p2 ,a for the Synchronization Preservation protocol. 

Tsp,p2-p4,ref — Tsp, p2 ,a (5.207) 

Rsp,p 2 -p 4 denotes the expected reception delay for process P4 of the Synchronization Preservation protocol. 
Rsp,p 2 -p 4 is measured from the time of Accept in process P2 to the expected time of reception in process 
P4. Let A spp4j rcvwnd denote the delay from the reference time to the opening of the input window in 
process P4. A SP , P4 , R c V wNDlmin denotes the minimum value, which is determined by the implementation. 
For proper communication, the following relation must be satisfied: 
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RsP,P2-P4 — AsP,P4,RCVWND + ^SP,P4,RCvlmax (5.208) 

As for the case of B P2 , solving this equation for B P3 is non-trivial. Therefore, we use here the same 
algorithm used to solve for B P2 . An additional constraint is that B P3 must be greater than or equal to 
Acomm " 1 - 

1 . B P3 = B P3 l min 

2. if (B P3 < A(; ninin - 1), then Bp 3 — Ac omm - T 

2. while [RsP,P2-P4(Bp3) < A SPj P 4 ,RCvlmax(Bp 3 ) + A S p j P4,RCVWNDlmin] 

3. {Bp3 = B P3 + 1 } 

4. Results: 

5. B P 3 

6. RsP,P 2-P4 = RsP,P2-P4(Bp3) 

7 • A SP P4 RcvUax = A S p,p4,RCvlmax(Bp3) 

8- A SP p4 R CV WND = RsP,P 2-P4 “ A SP ,p 4> RCvlmax 


5.10. Additional considerations 


5.10.1. Frame Synchronization 

The Frame Synchronization protocol is executed by recovering nodes in the Synchronization 
Acquisition mode. The end of the Frame Synchronization protocol triggers the execution of the P3C or 
P4C synchronization-capture processes. An assumption for the protocol is the existence of a single valid 
clique in Preservation mode. The protocol monitors the ECHO messages from the trusted nodes 
identified during Local Diagnosis Acquisition. Achieving frame synchronization is equivalent to finding 
the time gap between consecutive executions of the Synchronization Preservation protocol. The Frame 
Synchronization protocol consists of searching for a time interval during which the clique is not sending 
ECHO messages. Finding such interval indicates that the clique is in between computations of clock 
adjustments, and thus it is an appropriate time to stall the execution of the Synchronization Capture 
protocol. The Frame Synchronization protocol can achieve synchronization even if, for the node 
executing the protocol, it is not true that a majority of the eligible sources of the opposite kind is 
trustworthy. The time interval measured by the gap timer, called the frame synchronization gap, 
corresponds to the maximum observed relative skew between received ECHO messages from trustworthy 
nodes. The analysis presented here applies to BIUs and RMUs. 

Let A FSi gap denote the duration of the frame synchronization gap, measured in local clock ticks. 
Afs.gapIrmu and A fsgap I biu correspond to A ks gap for RMUs and BIUs, respectively. 

AfS.GApIrMU = Flsp,p 3 C,RCV (5.209) 

AfS.GApIbIU — n sP , P 4C,RCV (5.210) 

We choose A F s,GAplmax to be the largest value of A F s,gap- 

AFS.GApImax ^ max(IIsp,p 3 C,RCV j Elsp,P4C,RCv) (5.21 1) 
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We are interested in the worst-case duration of the Frame Synchronization protocol. A FS denotes the 
actual duration of the execution of the Frame Synchronization protocol measured in local clock ticks. To 
determine the maximum duration of the Frame Synchronization protocol, we need to consider the 
possible patterns of interruption of the interval timer. 

N denotes the total number of BIU nodes, and M denotes the total number of RMUs nodes. Let to 
denote the number of eligible sources of the opposite kind. 

col max = max(N, M) (5.212) 

A fs denotes the actual duration of the execution of the Frame Synchronization protocol, measured in 
local-clock ticks. An assumption for the Frame Synchronization protocol is that, during its execution, it 
will encounter at most one execution of the Synchronization Preservation protocol. Therefore, a source is 
allowed to interrupt the interval timer at most once during the execution of the protocol. In the worst- 
case, interruptions from eligible sources can consume up to col max *A FS> GAplmax local ticks in failed attempts 
to find a quiet frame synchronization gap (i.e., an interval with no gap timer interruptions). Adding an 
additional A fs ,gap for the last interval, for which interruptions would not be allowed, then: 

^hsLux — (tt)l max + l)*A F SGAplmax (5.213) 

8 fs denotes the worst-case duration of the Frame Synchronization protocol measured in nominal clock 
ticks. 

^FS^max — (1 + po)A F sl max (5.214) 

The assumption that, during its execution, the Frame Synchronization protocol will encounter at most 
one execution of the Synchronization Preservation protocol imposes the following constraint on the 
minimum duration of the resynchronization period p min . 

Pmin — 8 F slmax (5.215) 

5.10.2. Executing Synchronization Preservation after Synchonization Acquisition 

Recovering BIUs synchronize to an existing clique by executing the Synchronization Capture protocol 
and synchronizing with respect to process P4C. After synchronizing, the recovering BIUs behave 
synchronously just like the existing BIU members of the clique. In the first execution of the 
Synchronization Preservation protocol, the recovering BIUs synchronize with respect to process P2. 
There are two important differences between the existing trustworthy BIU members of a clique and good 
recovering BIUs during this execution of the Synchronization Preservation protocol. 

The first difference is that good recovering BIUs do not transmit synchronization messages. Even if 
they transmitted messages, the existing members of the clique would not include those messages in the 
computation of the protocol. Therefore, for the existing trustworthy clique members, the result of the 
synchronization protocol does not depend on the performance of recovering BIUs. 

The second (and more important) difference is that the good recovering BIUs are not necessarily 
synchronized to the existing trustworthy BIU clique members as tightly as the existing trustworthy BIU 
clique members are synchronized to each other. This difference is significant in the execution of process 
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P 2 and in the definitions of the expected reception delay Rsp,po-p2 and the worst-case local-time difference 
between the actual time of reception and the expect time of reception A S p,P 2 ,Rcvlmax- As presented 
previously, the definition of these parameters uses the relative local-time skew of the transmitting BIUs in 
process PO (i.e., 7l P0 ). Because good recovering BIUs are not included in the definition of 7l P0 , their 
effective reception delay and its the worst-case error can be different than for the trustworthy BIU clique 
members. To correct this problem, the relative local-time skew used to compute Rsp,po-p2 and A SP ,P2,Rcvlmax 
must include the good recovering BIUs. 

7 tsp.polp 2 .Rcv denotes the value of Jt P0 used to compute Rsp,po-p2 and A SPiP2 ,Rcvlmax for process P2 of the 
Synchronization Preservation protocol. 

7tsP.Polp2.RCV = 7tp2+P4C,H + [(1 + Po) ' 1/(1 + Po)]P (5.216) 

7tsp,po denotes the value of 7t P0 for the Synchronization Preservation protocol. Except for the case above, 
Ttsp.po is: 


7tsp,P0 - 7tp2,H + [(1 + po) - 1/(1 + Po)]P 


(5.217) 


5.10.3. Time service accuracy for the Synchronization Preservation protocol 

The PEs receive periodic time updates from the BIUs in the form of INIT messages. These messages 
are triggered by the output of the Accept(INIT) functions in process P2 of the Synchronization 
Preservation protocol. The accuracy of the time service is defined here as the maximum error in the 
expect period between Accept(INIT) outputs in consecutive executions of the Synchronization 
Preservation protocol. 

Consider two consecutive executions of the Synchronization Preservation protocol, denoted by SP1 
and SP2. 7I P 2 ,a denotes the bound on the real-time relative skew of the Accept outputs in process P2 at the 
trustworthy BIU nodes. 7t P2 .A applies to SP1 and SP2. Let t P2 ,A,ilspi and t P2 ,A.hlspi denote the bounds on the 
earliest and latest real times, respectively, at which the trustworthy BIU nodes synchronizing with respect 
to process P2 of SP1 assert the output of their Accept(INIT) functions. Thus: 

7tp2,A — tp2,A,hlsPl ' tp2,A,llsPl (5.218) 

Let t P2jA ,ilsp 2 and t P2iA ,hlsP 2 denote the bounds on the earliest and latest real times, respectively, at which 

the Accept outputs are asserted in process P2 at the trustworthy BIUs for SP2. The relations between 

tp 2 ,A,hlspi and t F2 ,A.ilsph and t P2jA ,hlsp 2 - tp 2 ,A,ilsp 2 are constrained by the drift rate of the local-time clocks and 
the validity interval for the Accept outputs in process P2 of SP2. 

tp 2 ,A,ilsP 2 = tp 2 ,A,ilspi + 2r PP j + (H P 2 + T$p + B P o + A P i + B P i + A P 2)/(1 + po) (5.219) 

tp 2 ,A,hlsP 2 = tp 2 ,A,hlspi + 2r PPj h + (1 + po)(H P 2 + Tsp + B P o + A P i + B P i + A P 2 ) (5.220) 

Psvdmin and Psvdmax denote the minimum and maximum intervals, respectively, between time updates 
for the time -reference service, measured in units of nominal clock ticks. 

Psvdmin = tp2,A,llsP2 “ tp2,A,hlsPl 
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- 2r PPj i + (H p2 + T SP + B P0 + A P1 + B P1 + A p2 )/(1 + po) - Jtp2,A 


(5.221) 


Psvdmax — tp 2 ,A,hlsP2 ' tp 2 ,A,llsPl 

= 2r PPj h + (1 + po)(H P2 + Ts P + B P o + A P i + B P i + A P2 ) + 7lp 2 .A 

P S vc denotes the expected period between time updates for the time -reference service, 
of nominal clock ticks. 

Psve = (Psvdmin + PsVclmax)/2 

Let a denote the accuracy of Psvc- 

— (Psvclmax " Psvdmin)/2 

= ttp 2 ,A + £pp + (1/2)[(1 + po) - 1/(1 + po)](H P2 + Ts P + B P o + A P i + B P i + A P2 ) 
Substituting for 71 P 2 ,a: 

a = 3e PP + [(1 + po) - 1/(1 + po)][(3/2)(A P1 + B P1 + A p2 ) + (1/2)(H P2 + T SP + B P0 )] 


(5.222) 
measured in units 

(5.223) 

(5.224) 

(5.225) 
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6. Clique Preservation mode 


This section examines the protocols executed in the Clique Preservation mode. The timing analysis 
also applies to the Clique Join mode. 

Figure 6.1 illustrates the message exchange pattern between a BIU and its attached PE during the 
Clique Join and the Clique Preservation modes. The mode and identification messages are sent to the PE 
between the Self-Diagnosis and Schedule Update services. During Schedule Update, the BIU reads the 
schedule submitted by its PE and sends to the PE the results determined by the bus for each PE. This is 
followed by a single message with the assessment of the new schedule. This consists of a SPECIAL 
message with the payload set to VALID_SCHEDULE, ZERO_SCHEDULE (i.e., the schedule is valid 
and equal to zero for all the PEs), or INVALID_SCHEDULE. If the schedule is invalid, the ROBUS will 
automatically switch to a default schedule. During the PE Broadcast service, a BIU reads the scheduled 
messages from its PE and outputs to the PE the result for all the scheduled messages. A broadcast result 
equal to PE_ERROR indicates that there was an error at the source PE. If the bus determines that the BIU 
of a source PE is not operating properly, then the result of the broadcast will be SOURCE_ERROR or 
NO_MAJORITY. A result of NO_MAJORITY indicates that the RMUs received different messages 
from the source BIU. If the assessment of the schedule was ZERO_SCHEDULE, the PE Broadcast 
service is not executed and the ROBUS simply waits until it is time to execute the Time Reference 
service. During the Time Reference service, a BIU outputs a SPECIAL message with the payload set to 
INIT. The sending of this message is triggered by the reference event that the BIU will use to reset its 
local-time clock. (For Initial Synchronization and Synchronization Acquisition, the payload is set to 
ECHO to explicitly indicate that a different protocol event is used as a reference to reset the local-time 
clock.) During the Self-Diagnosis service, the output of a BIU consists of two messages containing the 
diagnostic results for the BIUs and the RMUs. These are the last messages of the cycle. The next 
messages are the mode and identification messages for the next cycle. 
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Figure 6.1: Message exchange pattern between a BIU and its attached PE in Clique Preservation mode 
A: PE-to-BIU messages, B: Bus services, C: BIU-to-PE messages 
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6.1. Collective Diagnosis Protocol 

Figure 6.2 shows the message flow graph for the Collective Diagnosis protocol. For this protocol, 
BIUs and RMUs have the same timing characteristics. The analysis presented here does not refer to the 
kind of the node sending or receiving messages for any of the protocol processes. 


PEs 

BIUs 

RMUs 



Figure 6.2: Message flow graph for the Collective Diagnosis protocol 


6.1.1. Stage 1: PO to PI 

T C d denotes the local-time trigger for the execution of the Collective Diagnosis protocol, R PP denotes 
the point-to-point expected reception delay, S C d,po denotes the Send Process delay for process PO, 
ScD.polmin denotes the minimum send delay for process PO, A C d,pi,rcvwnd denotes the delay from the 
communication reference time to the opening of the reception window in process PI; AcD.pi.RcvwNDlmin is 
the minimum value. T C d,po,snd denotes the send time for process PO, and T C d,pi,rcv,e denotes the expected 
time of reception for process PI. 

6.1. 1.1. Communication between processes PO and PI 

Let T C d,po-pi,ref denote the reference time for the transmission between PO and PI. 

TcD.PO-Pl.REF = T cd (6.1) 

Case 1: ScD,Polmin + RpP ^ AcD.Pl.RCVWNDlmin + Woeskew.pre 
For this case: 

ScD.PO — ScD.polmin 

Acd,P 1,RCVWND — ScD.polmin + RpP " WDeskew.pre 

So: 

TcD.P0.SND — Tcd.P0-P1.REF + Scd.PO 

= Tcd + ScD.polmin (6-4) 

And : 


(6.2) 

(6.3) 
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(6.5) 


Tcd.PI.RCV.E — Tcd.PO.SND + Rpp 

= Tcd + ScD.Polmin + Rpp 

Case 2: ScD,Polmin + Rpp < AcD.Pl.RCVWNDlmin + Woeskew.pre 

For this case: 

ScD.PO — AcD.Pl.RCVWNDlmin + W Des k ew pre - Rpp (6.6) 

AcD.Pl.RCVWND — AcD.Pl.RCVWNDlmin (6-7) 

So: 

TcD.P0.SND — Tcd.P0-P1.REF + Scd.PO 

= Tcd + AcD.Pl.RCVWNDlmin + W Des kew ,pre Rpp (6.8) 

And: 

Tcd.PI.RCV.E — Tcd.PO.SND + Rpp 

— Tcd + AcD.Pl.RCVWNDlmin + Wneskew.pre (6.9) 


6.1. 1.2. Computation in process PI 

Ccd.pi denotes the computation delay in process Pi. The computation delay is measured from the end 
of the deskewing window. Ccd.pi is assumed to be a constant independent of the computation input or the 
system state. T C d,pi,c denotes the local time at which the Computation Process outputs the result for 
process PI. 

Tcd.PEC = Tcd.PI.RCV.E + WDeskew, post + CcD.Pl (6.10) 

6.1.2. Stage 2: PI to P2 

6.1.2. 1. Communication between processes PI and P2 

Let Tcd,pi-p2,ref denote the reference time for the transmission between PI and P2. We choose the end 
of the deskewing window in process PI as the reference time. 

Tcd,P1-P2,REF = Tcd,P 1,RCV,E + Woeskewpost (6.11) 

Scd.pi denotes the Send Process delay for process PI, A C d,p2,rcvwnd denotes the delay in opening the 
reception window in process P2, Tcd.pi.snd denotes the send time for process PI, and T C d,p2,rcv,e denotes 
the expected time of reception for process P2. 

We assume that the constraint on the minimum data introduction interval for the Communication 
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Module is satisfied by the mi nimum delay between the send times for processes PO and PI. That is, we 
assume that the following is true: 


Rpp + Woeskew.post + CcD.Pl + ScD.PlImin ^ ^Comm 
Case 1: Ccd,P 1 + ScD.PlImin + Rpp ^ AcD,P2,RCVWNDlmin + Woeskew.pre 
For this case: 

ScD.Pl — ScD.PlImin 

Acd,P2,RCVWND — Ccd.P1 + ScD.PlImin + Rpp “ Woeskew.pre 

So: 

TcD.Pl.SND = Tcd,P 1 -P 2 ,REF + CcD.Pl + ScD.Pl 

= Tcd.PI.RCV.E + WDeskew.post + CcD.Pl + ScD.PlImin 

And: 

Tcd.P2.RCV.E — Tcd.P 1.SND + Rpp 

= Tcd.PI.RCV.E + WDeskew.post + CcD.Pl + ScD.PlImin + Rpp 
Case 2: Ccd,P1 + ScD.PlImin + Rpp < AcD,P2,RCVWNDlmin + W De skew,pre 

For this case: 

ScD.Pl — AcD,P2,RCVWNDlmin + Woeskew.pre " Rpp “ CcD.Pl 
Acd,P 2,RCVWND = AcD,P2.RCVWNDlmin 

So: 

TcD.Pl.SND = Tcd,P 1 -P 2 ,REF + CcD.Pl + ScD.Pl 

= Tcd.PI.RCV.E + W D eskew,post + AcD,P2,RCVWNDlmin + W De skew,pre “ Rpp 

And: 

Tcd.P2.RCV.E — Tcd.P 1.SND + Rpp 

= Tcd.PI.RCV.E + W D eskew,post + AcD,P2.RCVWNDlmin + W De skew,pre 


( 6 . 12 ) 


(6.13) 

(6.14) 


(6.15) 


(6.16) 


(6.17) 

(6.18) 


(6.19) 


(6.20) 


6.1.2. 2. Computation in process P2 

Ccd,p2 denotes the computation delay in process P2. The computation delay is measured from the end 
of the deskewing window. C C d,p 2 is assumed to be a constant independent of the computation input and 
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the system state. T C d,p 2 .c denotes the local time at which the Computation Process outputs the result for 
process P2. 

Tcd,P2,C = Tcd,P2,RCV,E + Woeskew.post + Ccd,P2 (6.21) 

6.1.3. Stage 3: P2 to P3 

6.1. 3.1. Communication between processes P2 and P3 

Let Tcd.p 2 -p 3 .ref denote the reference time for the transmission between P2 and P3. We choose the end 
of the deskewing window in process P2 as the reference time. 

Tcd.P2-P3,REF = Tcd,P2,RCV,E + W De skew ,post (6.22) 

S C d,p 2 denotes the Send Process delay for process P2, A C d,p 3 ,rcvwnd denotes the delay in opening the 
reception window in process P3, T C d.p 2 .snd denotes the send time for process P2, and T C d,p 3 ,rcv,e denotes 
the expected time of reception for process P3. 

We assume that the constraint on the minimum data introduction interval for the Communication 
Module is satisfied by the minimum delay between the send times for processes PI and P2. That is, we 
assume that the following is true: 

Rpp + ^Voeskew.post “t" Ccd,P 2 ‘^CD.P2(nin - Acomm (6.23) 

Case 1: Ccd,P2 + ScD,P2lmin + Rpp ^ AcD,P3,RCVWNDlmin + Woeskew.pre 

For this case: 

ScD,P2 — ScD,P2lmin (6.24) 

Acd,P3,RCVWND — Ccd,P 2 + ScD,P2lmin + Rpp “ Woeskew ,pre (6.25) 


So: 


TcD.P2.SND — Tcd.P2-P3.REF + Ccd,P2 + Scd,P2 

— Tcd,P2,RCV,E + Woeskew.post + Ccd,P 2 + ScD,P2lmin (6.26) 

And: 

Tcd,P3,RCV,E — Tcd,P2,SND + Rpp 

= Tcd,P2.RCV,E + Woeskew.post + Ccd,P2 + ScD,P2lmin + Rpp (6.27) 

Case 2: Ccd,P2 + ScD,P2lmin + R pp < ^CD,P3,RCVWNDLin + W Des kew,pre 
For this case: 
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ScD,P2 — ^CD,P3,RCVWNDlmin + W De skew,pre - Rpp - Ccd,P2 
Acd,P 3,RCVWND = AcD,P3,RCVWNDlmin 

So: 

TcD.P2.SND = Tcd,P2-P3,REF + Ccd,P2 + Scd,P2 

= Tcd,P2,RCV,E + W Des kew,post + ^CD,P3,RCVWNDlmin + W De skew,pre " Rpp (6.30) 

And: 

Tcd,P3,RCV,E = Tc D p2 i SND + Rpp 

= TcD,P2.RCV,E + W De skew,post + ^CD,P3.RCVWNDlmin + W De skew,pre (6.31) 

6. 1.3.2. Computation in process P3 

Let C C d,p 3 denote the computation delay in process P3. The computation delay is measured from the 
end of the deskewing window. C C d,p 3 is assumed to be a constant independent of the computation input 
and the system state. T C d,p 3 ,c denotes the local-time at which the Computation Process outputs the result 
for process P3. 

Tcd,P3,C — Tcd,P3,RCV,E + Deskew, post + Ccd,P3 (6.32) 

6.1.4. Stage 4: P3 to P4 

6.I.4.I. Communication between processes P3 and P4 

Let Tcd,p 3 -p 4 ,ref denote the reference time for the transmission between P3 and P4. We choose the end 
of the deskewing window in process P3 as the reference time. 

Tcd,P3-P4,REF = Tcd,P3,RCV,E + W De skew ,post (6.33) 

S C d,p 3 denotes the Send Process delay for process P3, A C d,p 4 ,rcvwnd denotes the delay in opening the 
reception window in process P4, T C d.p 3 .snd denotes the send time for process P3, and T C d,p 4 ,rcv,e denotes 
the expected time of reception for process P4. 

We assume that the constraint on the minimum data introduction interval for the Communication 
Module is satisfied by the minimum delay between the send times for processes P2 and P3. That is, we 
assume that the following is true: 

Rpp + AAeskew.posl Ccd,P 3 "t" ‘^CD.PiLiin — ^Comm (6.34) 

Case 1: Ccd,P 3 + ScD,P3lmin + Rpp ^ AcD,P4,RCVWNDlmin + Woeskew.pre 

For this case: 


(6.28) 

(6.29) 
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(6.35) 


ScD,P3 — ScD,P3lmin 

Acd,P4,RCVWND = Ccd,P3 + ScD^min + Rpp “ W Des k eW p re (6.36) 

So: 

TcD.P3.SND = Tcd,P3-P4,REF + Ccd,P3 + Scd,P3 

= Tcd,P2,RCV,E + Woeske^pos, + Ccd,P3 + ScD^min (6.37) 

And: 

Tcd,P4,RCV,E — Tcd,P 3,SND + Rpp 

= Tcd,P3.RCV,E + W Des kew,post + Ccd,P3 + ScD,P3lmin + Rpp (6.38) 

Case 2: Ccd,P 3 + ScD,P3lmin + Rpp < AcD,P4,RCVWNDlmin + Woeskew.pre 

For this case: 

ScD,P3 — AcD,P4,RCVWNDlmin + W Des kew,pre " Rpp “ Ccd,P3 (6.39) 

Acd,P 4,RCVWND — AcD,P4.RCVWNDlmin (6.40) 

So: 

Tcd,P3,SND = Tcd.P3-P4.REF + Ccd,P3 + Scd,P3 

= Tcd,P3,RCV,E + WDeskew.post + AcD,P4.RCVWNDlmin + Deskew, pre Rpp (6.41) 

And: 

Tcd,P4,RCV,E — Tcd,P 3,SND + Rpp 

= Tcd.P3.RCV,E + WDeskew.post + AcD,P4,RCVWNDlmin + Wneskew.pre (6.42) 

6. 1.4.2. Computation in process P4 

Let C C d,p 4 denote the computation delay in process P4. The computation delay is measured from the 
end of the deskewing window. Ccd,p 4 is assumed to be a constant independent of the computation input 
and the system state. Let T C d,p 4 ,c denote the local time at which the Computation Process outputs the 
result for process P4. 

Tcd,P4,C = Tcd,P 4,RCV,E + WDeskew.post + Ccd,P4 (6.43) 

6.1.5. Duration of the protocol 

Let Acd,p 4 ,c-end denote the delay in process P4 from the end of computation to the end of the Collective 
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Diagnosis protocol. A C d,p 4 ,c-end is assumed to be a constant independent of the system state. Let T C d.end 
denote the local-time at which the execution of the Collective Diagnosis protocol ends. 

Tcd.END = Tcd,P4,C + Acd,P4,C-END (6.44) 

Let A C d denote the duration of the execution of the Collective Diagnosis protocol. 

Acd — Tcd.end ■ Tcd (6.45) 

6.2. Schedule Update Protocol 

Figure 6.3 shows the message flow graph for the Schedule Update protocol. 

PEs 

BIUs 
RMUs 

Figure 6.3: Message flow graph for the Schedule Update protocol 



6.2.1. Stage 1: PO to PI 

Let T S u denote the local-time trigger for the execution of the Schedule Update protocol. The length of 
the message stream is equal to the number of BIUs, denoted by N. Let i denote the index for the 
messages in the stream. 


0 < i < N - 1 (6.46) 

Let A S u denote the data introduction rate for the stream. A S u must be greater than or equal to the 
minimum data introduction interval for the Communication and Computation Modules. 


Asu — max(A C omm, Acomp) (6.47) 

Let S S u,po denote the Send Process delay for process PO. S su ,po is assumed to apply to all the messages 
in the stream. Let A S u,pi,rcvwnd denote the delay from the communication reference time to the opening 
of the reception window in process Pl. A S u,pi,rcvwnd is assumed to apply to all the messages in the 
stream. 

T su.po.sND.i denotes the send time for the i-th message in process PO and T SUP1 RCV ,E,i denotes the 
expected time of reception for the i-th message in process Pl. T SU P0 _ FiiRKKl denotes the reference time for 
the transmission of the i-th message between PO and Pl. 

Tsu.po-pi.REF.i = Tsu + iAsu (6.48) 

For the version of the RPP covered in this document, the RPP sends mode and node identification 
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messages to the PE triggered by the beginning of the Schedule Update protocol. 

The RPP sends the current-mode message to the PE A S u.snd-mode ticks after the beginning of the 
protocol. Asu.snd-mode-read-pe denotes the delay from the time the mode message is sent to the PE to the 
time at which the first schedule message is read from the PE. The RPP is designed to read each PE 
message 1 tick before it is to be sent. The absolute mi nimum value of Ssu.po denoted by S S u,polmin> and the 
delay to read the first PE message constrain the actual minimum value of Ssu.po to the following effective 
minimum value. 

SsU.Polmin.eff — niax(Ssu,Polmin i Asu,SND-MODE + Asu.SND-MODE-READ-PElmin +1) (6.49) 


6.2. 1.1. Communication between processes PO and PI 
Two cases must be considered. 

Case 1. Ssu.Polmin.eff "t Rpp — A5u.Pl .pC'VWNlTnin ^Teskew.pre 
For this case: 

Ssu.po — Ssu.po I min, eff 

Asu.Pl.RCVWND = Ssu.Polmin.eff + Rpp “ Wueskewpre 

So: 

Tsu.PO.SND.i — Tsu.PO-Pl.REF.i + Ssu.PO 

= Tsu + iAsu + Ssu.Polmin.eff 

And: 

Tsu.Pl.RCV.E.i — Tsu.PO.SND.i + Rpp 

= Tsu + iAsu + Ssu.Polmin.eff + Rpp 
Case 2: Ssu.Polmin.eff + Rpp < Asu.Pl.RCVWNDlmin + Wueskew.pre 

For this case: 

Ssu.po — Asu.pi.RcvwNDlmin + W Des ]j eWj p re - R PP 

Asu.Pl.RCVWND — Asu.Pl.RCVWNDlmin 

So: 

Tsu.PO.SND.i — Tsu.PO-Pl.REF.i + Ssu.PO 

— Tsu + iAsu + Asu.Pl.RCVWNDlmin + Wueskew.pre " Rpp 


(6.50) 

(6.51) 

(6.52) 

(6.53) 

(6.54) 

(6.55) 

(6.56) 
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And: 


Tsu,Pl,RCV,E.i — Tsu,PO,SND,i + Rpp 

— Tsu + lAsu + Asu,Pl,RCVWNDlmin + Wueskew.pre 


(6.57) 


6.2. 1.2. Computation in process PI 

Let C S u,pi denote the computation delay in process PI. The computation delay is measured from the 
end of the deskewing window. Csu,pi is assumed to be a constant independent of the computation input 
and the system state. Let T S u,pi,c,i denote the local-time at which the Computation Process outputs the i-th 
result for process Pl. 

Tsu,Pl,C,i = Tsu.Pl.RCV.E.i + Woeskew.post + Csu.Pl (6.58) 

6.2.2. Stage 2: PI to P2 

6.2.2. 1. Communication between processes PI and P2 

Let T S u.pi-p2,REF,i denote the reference time for the i-th transmission between PI and P2. We choose the 
reference times for the transmissions between PI and P2 to be the same as for the transmissions between 
PO and Pl. 

TsU,Pl-P 2 ,REF,i — Tsu.PO-Pl.REF.i = Tsu + i^su (6.59) 

Let A S u,pi,ref-snd denote the delay from the reference time to the send time for the transmissions from 
process Pl to process P2. S S u,pi denotes the Send Process delay for process Pl measured from T su ,pi,c,i to 
the corresponding send time, A su ,p2.rcvwnd denotes the delay to open the reception window in process P2 
measured from the communication reference time to the opening of the input window, T SUiP1jSND ; denotes 
the send time for the i-th message in process Pl, and T S u.p2.Rcv,E,i denotes the expected time of reception 
for the i-th message in process P2. 

Since process P2 starts receiving messages after process Pl does, it is expected that process P2 will 
have ample time to open its input windows before the messages start arriving. Therefore, we only need to 
consider the case in which process Pl sends the messages as soon as possible. 

Asu,pi,ref-snd is measured from the reference times for the transmissions between Pl and P2. As such, 
in this case, A S u.pi,ref-snd includes the delay from the reference time until the time at which the 
Computation Process in Pl outputs the results. The actual Send Process delay is only Ssu,pilmin- 

So: 


^SU.P1,REF-SND — Ssu.PO + Rpp + W Des i; eWj p 0st + Csu.Pl + Ssu.PlImin 
SsU.Pl = Ssu.PlImin 
And: 


(6.60) 

(6.61) 
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Asu,P 2,RCVWND — ^SU.Pl.RCVWND + Woeskew + Csu.Pl + Ssu.PlImin + Rpp " W Deskewpre 


(6.62) 


— Asu.pi.rcvwnd + W Deskewpost + Csu.pi + Ssu.pilmin + Rpp (6.63) 

In addition: 

TsU.Pl.SND.i = TsU.Pl.C.i + SsU.Pl 

= Tsu,Pl,RCV.E.i + W DeskeWj p 0st + Csu.Pl + Ssu.PlImin (6.64) 

And: 

TsU,P2,RCV.E.i — TsU.Pl.SND.i + Rpp 

= TsU.Pl.RCV.E.i + W DeskeWjpost + Csu.Pl + Ssu.PlImin + Rpp (6.65) 

6. 2.2. 2. Computation in process P2 

Let C S u,p 2 denote the computation delay in process P2. The computation delay is measured from the 
end of the deskewing window. C S u,P 2 is assumed to be a constant independent of the computation input 
and the system state. Let T S u,p 2 ,c,i denote the local-time at which the Computation Process outputs the i-th 
result for process P2. 

TsU,P2,C,i — T SU.P2,RCV,E,i + Woeskew.post + Csu,P2 (6.66) 

6.2.3. Stage 3: P2 to P3 

6.2.3. 1. Communication between processes P2 and P3 

The communication pattern of the Schedule Update protocol consists of two passes through the 
system. Conceptually, we think of the protocol as processing two streams. The first stream is processed 
from PO to P2, and the second stream is processed from P2 to P4. 

Let Ash stream po-pdmin denote the desired minimum separation between the two streams. This 
separation constraint is measured from the last message of the first stream to the first message of the 
second stream, and it is assumed to apply at the inputs and outputs of the BIUs and the RMUs. 
Asu,stream,po-p 2 Ibiu and A S u,stream,po-p 2 Irmu denote the actual separation between the streams at the BIUs in 
process P2 and at the RMUs in process P3, respectively. 

Asu, stream, P 0 -P 2 1 min must be greater than or equal to the stream’s data introduction interval. 

Asu,STREAM,P0-P2lmin > A su (6-67) 

At P3, we want the expected reception intervals for the first and the second stream to be separated by 
at least one tick. This is captured by the following constraint. 

Asu, STREAM, P0-P2 1 min -W Des kew+l (6.68) 
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The effective allowed mi nimum separation between the streams is: 

AsU.STREAM,P0-P2lmin,eff = max (Asu,STREAM,P0-P2lmin 1 Asu ? W Deskew + 1) (6.69) 

Let T SU p 2 .p 3 i REF,i denote the reference time for the i-th transmission between P2 and P3. We choose 
these reference times to be the equal to the times at which the Computation Process of P2 outputs the 
corresponding results. 

Tsu,P2-P3,REF,i — TsU,P 2 ,C,i (6.70) 

S S u,p 2 denotes the Send Process delay for process P2, T S u,p 2 ,SND,i denotes the send time for the i-th 
message in process P2, and T SUiP3jR cv,E,i denotes the expected time of reception for the i-th message in 
process P3. 

Tsu,P2,SND,i must be chosen to ensure that the constraint on the minimum stream separation is satisfied. 
Two cases must be considered. 

Case 1: Tsu,P 2,C,0 + Ssu,P 2 lmin < Tsu,P 0 ,SND,N -1 + Asu.STREAM.PO-P 2 lmin.eff 

For this case, the results of the computation in P2 must be buffered until they can be sent with the 
proper separation from the first stream. So: 

Asu,STREAM,P0-P2IbIU — Asu.STREAM.PO-P 2 lmin.eff (6.71) 

Let Asu,p 2 ,buf denote the buffering delay for the stream at P2. 

Asu,P2,BUF — (Tsu,P 0,SND,N-1 + Asu.STREAM.PO-P 2 lmin.eff) ■ (Tsu,P2,C,0 + Ssu,P2lmin) (6.72) 

The send delay for process P2 is: 

SsU,P 2 — Ssu,P 2 lmin + Asu.P 2 .BUF (6.73) 

So: 

Tsu,P 2 ,SND,i = Tsu,P 2 ,C,i + Ssu,P 2 

= Tsu,P 2 ,RCV,E,i + W DeskeWjpost + Csu,P 2 + Ssu,P 2 (6.74) 

And: 

TsU,P3,RCV,E,i = Tsu,P2,SND,i + Rpp 

= TsU,P 2 ,RCV.E.i + W Deskewpost + Csu,P 2 + Ssu,P 2 + Rpp (6.75) 

Case 2: T S U.P2.C,0 + Ssu,P 2 linin - Tsu.P 0 ,SND,N -1 + Asu.STREAM.PO-P 2 lmin.eff 

For this case, the results of the computation can be sent as soon as possible. The send delay is the 
minimum value. 
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(6.76) 


SsU,P 2 — Ssu.PlImin 
So: 

TsU,P 2 ,SND,i — Tsu,P 2 ,C,i + SsU,P 2 

= Tsu,P 2 ,RCV.E.i + W DeskeWj p 0st + Csu,P 2 + Ssimlmin (6.77) 

And: 

T SU,P3,RCV.E.i — Tsu,P2,SND.i + Rpp 

= TsU,P 2 ,RCV.E.i + W DeskeWjpost + Csu,P 2 + Ssu,P 2 lmin + Rpp (6.78) 

Therefore, the actual separation between the streams at the BIUs is: 

Asu,STREAM,P0-P2IbIU — Tsu,P2,SND,0 - Tsu.P0,SND,N-1 (6.79) 

= Tsu .P2.SND.0 ■ [Tsu.PO.SND.O + (N-1)A SU ] (6.80) 

= 2R PP + 2W DeskeWjP0st + (Csu.pi + Csu,p2) + (Ssu.pi + Ssu,p2) - (N-1)A SU (6.81) 

6.23.2. Computation in process P3 

Let C S u,p 3 denote the computation delay in process P3. The computation delay is measured from the 
end of the deskewing window. C S u,P3 is assumed to be a constant independent of the computation input 
and the system state. Let T SUi p 3iC i denote the local-time at which the Computation Process outputs the i-th 
result for process P3. 

TsU,P3,C,i — T SU.P3.RCV,E,i + W DeskeWj p 0s t + Csu,P3 (6.82) 

6.2.4. Stage 4: P3 to P4 

6.2.4. 1. Communication between processes P3 and P4 

Asu, stream, P0-P2 I min.eff also applies to the communication between P3 and P4. Let Ssu,P3 denote the Send 
Process delay for process P3. Ssu,P3 is measured from the time the Computation Process outputs a 
message until the time the message is sent. Let T S u.P 3 -p 4 ,REF,i denote the reference time for the i-th 
transmission between P3 and P4. We choose these reference times to be the equal to the times at which 
the Computation Process outputs the corresponding results. 

TsU,P3-P4,REF,i = T S u,P3,C,i (6.83) 

Tsu,P 3 ,SND,i denotes the send time for the i-th message in process P3, and T S u,P3,Rcv.E.i denotes the 
expected time of reception for the i-th message in process P3. Tsu.p 3 .snd.i must be chosen to ensure that 
the constraint on the minimum stream separation is satisfied. Two cases must be considered. 
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Case 1: Tsu,P3,C,0 + Ssu,P3lmin < Tsu,P 1 ,SND,N -1 + ^SU,STREAM,P0-P2lmin,eff 

For this case, the results of the computation in P3 must be buffered until they can be sent with the 
proper separation from the first stream. So: 

Asu,STREAM,P0-P2IrMU — Asu,STREAM,P0-P2lmin,eff (6.84) 

Let Asu.p 3 .buf denote the buffering delay for the stream at P3. 

AsU.P3.BUF = (TsU,P1,SND,N-1 + Asu,STREAM,P0-P2lmin,eff) “ (Tsu,P3,C,0 + SsU,P3lmin) (6.85) 

The send delay for process P3 is: 

SsU,P3 — Ssu,P3lmin + A S u,P3,BUF (6.86) 

So: 

Tsu,P3,SND,i — Tsu,P3,C,i + SsU,P3 

= Tsu,P3,RCV.E.i + W DeskeWjpost + Csu,P3 + SsU,P3 (6.87) 

And: 

T SU,P4,RC V.E.i — Tsu,P3,SND.i + Rpp 

= Tsu,P3,RCV.E.i + W Des kew,post + Csu,P3 + Ssu,P3 + Rpp (6.88) 

Case 2: T S u .P3.C.0 + SsU,P3lmin ^ Tsu,P1,SND,N-1 + Asu,STREAM,P0-P2lmin,eff 

For this case, the results of the computation can be sent as soon as possible. The send delay is the 
minimum value. 

SsU,P3 - Ssu,P3lmin (6.89) 

So: 

Tsu,P3,SND,i — TsU,P3,C,i + Ssu.P3 

= Tsu,P3,RCV.E.i + W DeskeWjpost + Csu,P3 + Ssu,P3lmin (6.90) 

And: 

Tsu,P4,RC V.E.i = TsU,P3,SND,i + Rpp 

— TsU,P3,RCV.E.i + W DeskeWj p 0st + Csu,P3 + Ssu.P3lmin + Rpp (6.91) 

Therefore: 

Asu,STREAM,P0-P2IrMU = TsU.P3.SND,0 ■ Tsu.pi,snd,n-i (6.92) 
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— Tsu,P 3,SND,0 ' [Tsu.Pl.SND.O + (N-1)A SU ] 


(6.93) 


- 2R PP + 2W Deskew 

,post + (Csu.P2 + Csu,P3) + (Ssu.P 2 + Ssu, P3 ) - (N-l)Asu (6.94) 


6.2.4.2. Computation in process P4 

Let C S u,p 4 denote the computation delay in process P4. The computation delay is measured from the 
end of the deskewing window. C S u,P 4 is assumed to be a constant independent of the computation input 
and the system state. Let T su>P4> c.i denote the local-time at which the Computation Process outputs the i-th 
result for process P4. 

TsU,P4,C,i = T SU.P4,RCV,E,i + Woeskew.post + Csu,P4 (6.95) 

6.2.5. Duration of the protocol 

Let A S u,p 4 ,c-end denote the delay in process P4 from the end of the computation for the last message to 
the end of the Schedule Update protocol. A S u,p 4 ,c-end is assumed to be a constant independent of the 
system state. Let T S u.end denote the local-time at which the execution of the Schedule Update protocol 
ends. 

TsU.END = T su ,p4, CjN _i + A SU ,p 4 ,C-E ND (6.96) 

Let A su denote the duration of the Schedule Update protocol execution. 

Asu — Tsu.end - Tsu (6.97) 


6.3. PE Communication and Accusation Exchange Protocols 

Figures 6.4 and 6.5 show the message flow graphs for the PE Broadcast and Accusation Exchange 
protocols, respectively. For this RPP, these protocols are implemented as a single protocol composed of 
two phases. In the first phase, the scheduled PE messages are processed as a stream. In the second phase, 
the protocol adds a message containing accusations against nodes of the opposite kind. The processing of 
the scheduled PE messages is examined next. The analysis of the accusations messages is presented after 
that. 


PEs 


BIUs 

RMUs 



Figure 6.4: Message flow graph for the PE Broadcast protocol 
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BIUs 

RMUs 



Figure 6.5: Message flow graph for the Accusation Exchange protocol 


6.3.1. Scheduled PE messages in stage 1: PO to PI 

Let T PE denote the local-time trigger for the transition to PE Communication mode. K^sched 1 denotes 
the number of scheduled PE messages. Let A PE sched denote the data introduction interval for the stream of 
scheduled PE messages. A PE sched must satisfy the constraints on the minimum data introduction intervals 
for the Communication and Computation modules. 

A PE)SC hed — max( A ( om in- A Com p) (6.98) 

Let i denote the index for the stream of scheduled PE messages. 

0<i<K PE , sched -l (6.99) 

Let S PE , P o,sched denote the Send Process delay for scheduled PE messages in process PO. We assume 
that S PEjP o, sched is a constant independent of process input. Let A PE , PER cvwND,sched denote the delay from the 
communication reference time to the opening of the reception window for scheduled PE messages in 
process Pl. We assume that A pep 1 RCV wnd, sched is a constant. T PEP0 ,sND,sched,i denotes the send time for the 
i-th scheduled PE message in process PO, and T PE PljRC v.E.sched,i denotes the expected time of reception for 
the i-th scheduled PE message in process Pl. Let T PEjP o- P i, RE F, S ched,i denote the reference time for the 
transmission of the i-th scheduled PE message between PO and Pl. 

T PE , P 0- P 1, RE F, sched, i = T PE + iA PEjSc h e d (6.100) 

For the version of the RPP covered in this document, the RPP sends the schedule-assessment message 
to the PE triggered by the beginning of the PE Communication minor mode. The RPP sends the message 
to the PE A PEjS nd-sa ticks after the beginning of the protocol at time T PE . A PEjS nd-sa-read-pe denotes the 
delay from the time the schedule assessment message is sent to the PE to the time at which the first PE 
message is read from a PE. The RPP is designed to read each PE message 1 tick before it is to be sent. 
The absolute minimum value of S PEiP0 , sc hed> denoted by S PEiP o, sc hedlmin> and the delay to read the first PE 
message constrain the actual minimum value of S PE P o, SC hed to the following effective minimum value. 

S P E.PO.schedlmin,eff — m ax(S PEiP o, sc hedlmin i A PEj SND-SA + A PE ,SND-SA-READ-PElmin + 1) (6.101) 

6.3. 1.1. Communication between processes PO and Pl 

Two cases must be considered. 

CtlSe 1. S PE P o,schedlmln,eff RpP — A PE jp I .RC V\VN D.schedfnin "t" ^^Deskew.pre 


1 KpE.sched is denoted simply as K when used as a subscript. 
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For this case: 


SpE.PO.sched S Ph.PO.schedl min.elT 

ApE.Pl.RCVWND.sched — SpE.PO.schedlmin.eff RpP - WDeskew.pre 

So: 

TpE.PO.SND.sched.i = TpE,PO-Pl,REF,sched,i + SpE,PO,schedlmin,eff 
— Tp E + iApppsched “t" SpE,PO,schedlmin,eff 

And: 

TpE,Pl,RCV,E,sched,i = TpE.PO.SND.sched.i + Rpp 

~ Tpe "1" lApE,sched “t" SpE.PO.schedlmin.eff + Rpp 


CclSe 2 . SpE.PO.schedlmin.eff "t" Rpp ^ ApE.pl, RCVWND.schedlmin ^^Deskew.pre 

For this case: 

SpE.PO.sched = ApE.Pl.RCVWNDlmin + Woeskewpje - Rpp 
ApE,Pl,RCVWND,sched = ApE.pi.RCVWND.schedlmin 

So: 

TpE.PO.SND.sched.i = TpE.PO-Pl.REF.sched.i + SpE.PO.sched 

= Tpe + iApE, s ched + ApE.pi.RCVWND.schedlmin + W Des k eWi p re - Rpp 

And: 

TpE.Pl.RCV.E.sched.i = TpE.PO.SND.sched.i + Rpp 

~ Tpe "t" iApE.sched “t" ApE.pi.RCVWND.schedlmin WDeskew.pre 


6.3. 1.2. Computation in process PI 

Let CpE.pi.sched denote the computation delay for scheduled PE messages in process 
computation delay is measured from the end of the deskewing window for each message. C 
assumed to be a constant independent of the computation input and the system state. Let r 
denote the local-time at which the Computation Process outputs the i-th result for process PI . 

TpE.Pl.C.sched.i — Tgu.Pl.RCV.E.sched.i "t" Woeskew.post + CpE.Pl.sched 


(6.102) 

(6.103) 

(6.104) 

(6.105) 


(6.106) 

(6.107) 

(6.108) 


(6.109) 
PI . The 

•PE.Pl.sched ls 
PE,Pl,C,sched,i 

( 6 . 110 ) 
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6.3.2. Scheduled PE messages in stage 2: PI to P2 


6.3.2. 1. Communication between processes PI and P2 

Let T su,pi-P2,REF,sched,i denote the reference time for the i-th transmission between processes PI and P2. 
We choose the reference times for the transmissions between PI and P2 to be the same as for the 
transmissions between PO and PI. 

TpE,Pl-P2,REF,sched,i = T PE ,P0-Pl,REF,sched,i = TpE + iApE.sched (6.111) 

ApE,pi,REF-SND,sched denotes the delay from the reference time to the send time for the transmissions of 
scheduled PE messages from process PI to process P2, S PE , pitched denotes the Send Process delay for 
process PI measured from T SU FLC | to the corresponding send time. A PE ,P 2 ,RcvwND,sched denotes the delay in 
the opening of the reception window in process P2 measured from the communication reference time to 
the opening of the input window, T PE ,pi,sND,sched,i denotes the send time for the i-th message in process PI, 
and T PE .p 2 .Rcv,E,sched,i denotes the expected time of reception for the i-th message in process P2. Since 
process P2 starts receiving messages after process PI does, it is expected that process P2 will have ample 
time to open its input windows before the messages start arriving. Therefore, we only consider the case in 
which process PI sends the messages as soon as possible. 

Ap E ,pi,REF-SND,sched is measured from the reference times for the transmissions between PI and P2 of 
scheduled PE messages. As such, in this case, A PE , P i,R E F-SND,sched includes the delay from the reference 
time to the time at which the Computation Process in PI outputs the results. The actual Send Process 
delay is only S PE .pi. S chedlmin- So: 

A P E,Pl,REF-SND,sched = SpE.PO.sched + RpP + Woeskew.post + Cp E , Pitched + SpE,Pl,schedlmin (6.1 12) 

And: 

SpE.Pl.sched = SpE.Pl.schedlmin (6.113) 

Also: 

ApE,P2,RCVWND,sched = A P E,pi,RCVWND,sched + W Des k ew + Cp E , Pl.sched + SpE,Pl,schedlmin + RpP “ W Des i; eW p re (6.1 14) 
— ApE.p 1 ,RC V WND.sched + WDeskew.post + CpE.Pl.sched + SpE.Pl.schedlmin + RpP (6.1 15) 

And: 

TpE.Pl.SND.sched.i — TpE.Pl.C.sched.i "t" SpE.Pl.schedlmin (6.1 16) 

— TpE.Pl.RCV.E.sched.i "t" WDeskew.post CpE,Pl,sched "t" SpE.Pl.schedlmin (6.117) 

And: 

TpE,P2,RCV,E,sched,i = T P E,pi,SND,sched,i + RpP (6.1 18) 

— Tsu.Pl.RCV.E.sched.i + Woeskew.post + CpE,Pl,sched + SpE,Pl,schedlmin + RpP (6.119) 
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6.3.2. 2. Computation in process P2 

Let C PE ,p 2 ,sched denote the computation delay in process P2. The computation delay is measured from 
the end of the deskewing window for each message. C PE .p 2 .sched is assumed to be a constant independent of 
the computation input and the system state. Let T PE ,p 2 ,c,sched,i denote the local-time at which the 
Computation Process outputs the i-th result for process P2. 

T P E ,P2,C.sched,i = Tp Ej p2,RCV,E,sched,i + Woeskew.post + Cp Ej p2, S ched (6.120) 

6.3.3. Accusations message in stage 1: P0 to PI 

The timing of the accusations message can be analyzed as if it were part of a separate stream being 
processed after the stream of PE messages. The analysis presented here follows the analysis used for the 
Schedule Update protocol. 


6.3.3. 1. Communication between processes P0 and PI 

Let A PE)Sch ed-acc I min denote the desired minimum separation between the stream of PE messages and the 
accusations message. This separation constraint is measured from the last message of the PE message 
stream to the accusations message, and it is assumed to apply at the inputs and outputs of the BIUs and 
the RMUs. We assume that A PEschcd . acc l min is greater than or equal to the data introduction interval of the 
PE message stream. 

ApH.sched-acc I min — Ap Escded (6.121) 

Let T PEjP o,acc denote the time at which the accusations are available for transmission in process P0. It is 
assumed that T PEjP o ia cc occurs after the reference time for transmission of the last PE message. 

TpE.po.acc T pe + (K PE sched - l)A PE sched (6.122) 

Let A PE ,po,acc_read y denote the delay from the reference time for transmission of the last PE message to 
the time at which the accusations are available for transmission in process P0. 

ApE,PO,acc_ready ~ Tp E> p(),acc ” [TpE "1" (KpE.sched " 1 )Ap E sc f lc d] (6.123) 

SpE,po,acc denotes the Send Process delay for the accusations message in process P0, T PE ,p 0 ,sND,acc 
denotes the send time for the accusations message in process P0, T PE , PER cv,E,acc denotes the expected time 
of reception for the accusations message in process PI, Tp E , P0 ,sND,sched,K-i denotes the send time for the last 
scheduled PE message. T PEi p 0j sND.acc must be chosen to ensure that the constraint on the minimum stream 
separation is satisfied. Two cases must be considered. 

Case 1. 3 pp.p0.acc S Ph.PO.acclnm ^ T P h . P 0 . S Nl ) , s c h cd , K - 1 + App sc | ]ed _ act l n]m 

For this case, the accusations message must be buffered until it can be sent with the proper separation 
from the PE message stream. Let A PEjP o,BUF,acc denote the buffering delay for the accusations at P0. 

ApE,P0,BUF,acc = (Tp Ej po,SND,sched,K-l + Ap Esc h e d-acclmin) " (Tp Ej po,acc + SpE.PO.accImin) (6.124) 
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The send delay for the accusations message in process PO is: 


SpE.PO.acc — SpE,PO,acclmin + ApE.PO.BUF.acc (6.125) 

So: 

TpE.PO.SND.acc = TpE,P 0 ,acc + SpE.PO.acc (6.126) 

And: 

T PE.Pl .RCV.E.acc = TpE.po.SND.acc + Rpp 

= TpE.P0.acc + SpE,P0,acc+ Rpp (6.127) 

Case 2. Tpjj.po.acc + S Ph.PO.acJniin — Tpjj po^ND.sched.K-l ApE.sched-acJmin 

For this case, the accusations message can be sent as soon as possible. The send delay is the minimum 
value. 

SpE.PO.acc ~ Spp po^acJrnin (6.128) 

So: 

TpE.P0.SND.acc = TpE.po.acc + SpE.PO.acc 

~ TpE.P0.acc "t" SpE.PO.acJmin (6.129) 

And: 

T PE.Pl .RCV.E.acc = TpE.P0.SND.acc + Rpp 

= TpE.p0.acc + SpE.PO.accImin + Rpp (6.130) 


6.3. 3.2. Computation in process PI 

Let CpE,pi, a cc denote the computation delay in process PI for the accusations message. The 
computation delay is measured from the end of the deskewing window. Let T PE ,pi,c,acc denote the local- 
time at which the computation process outputs the result. 

T PE.Pl. C.acc — T pE.pi. RCV.E.acc + W Des k ew p 0st + CpE.Pl.acc (6.131) 

6.3.4. Accusations message in stage 2: PI to P2 

Process PI generates a set of accusations that are not necessarily related to the accusations received 
from P0. The processing of these accusations is analyzed independently. However, it is assumed that the 
constraint on the minimum separation between the stream of PE messages and the accusations message 
also applies here. 
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6.3.4.I. Communication between processes Pl and P2 


Let T PEjP1>acc denote the time at which the accusations are available for transmission in process PI. 
TpE,pi,c,sched,K-i denotes the local time at which the Computation Process outputs the result for the last 
scheduled PE message. It is assumed that Tp E , PEacc occurs after the time at which the Computation 
Process outputs the result for the last PE message. 

Tp E ,pi, acc > T PE ,Pl,C,sched,K-l (6.132) 

Let A PE ,pi, acc _ read y denote the delay from the time at which the Computation Process outputs the result 
for the last PE message to the time at which the accusations are available for transmission in process Pl. 

ApE,Pl,acc_ready — Tp E ,Pl,acc - TpE,Pl,C,sched,K-l (6.133) 

Sp E ,p Ea cc denotes the Send Process delay for the accusations message in process PI, T Pa p 1SNDacc 
denotes the send time for the accusations message in process PI. T PE ,p 2 ,Rcv,E,acc denotes the expected time 
of reception for the accusations message in process P2, and Tp E , PES ND,sched,K-i denotes the sent time for the 
last scheduled PE message. T PEi p E s ND , acc must be chosen to ensure that the constraint on the minimum 
stream separation is satisfied. Two cases must be considered. 

Case 1 . T PEjPljacc + Sp E> p Eacc l m i n ^ 3 re.pi ,sND,sched,K- 1 App_ sc | ]ct |_ acc l nnn 

For this case, the accusations message must be buffered until it can be sent with the proper separation 
from the PE message stream. Let A PEj pi iBUFiacc denote the buffering delay for the accusations at Pl. 

ApE.Pl.BUF.acc = (Tp^PESND.sched.K-l + Ap Esc hed-acclmin) “ (Tp Ej p E acc + Sp E ,pi, acc lmin) (6.134) 

The send delay for accusations in process Pl is: 

SpE.pi.acc — Sp Ei pi jacc l m in + Ap E ,pi jB u Eacc (6.135) 


So: 


TpE.Pl.SND.acc — Tp E p Eacc + Sp E ,p Eacc (6.136) 

And: 

T PE.P2.RCV,E,acc = Tp Ej p E SND,acc + Rpp 

= Tp E ,p Eacc + Sp E p Eacc + Rpp (6.137) 

CtlSC 2: Tppp i acc + Sp Ei p Eacc l m i n ^ Tppp I .SND.sdicd.K- 1 App sched-acclmin 

For this case, the accusations message can be sent as soon as possible. The send delay is the minimum 
value. 

SpE.pi.acc ~ Spp p | . acc l| n in (6.138) 

So: 
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TpE.Pl.SND.acc — Tp E .pi, a cc + Spp.Pl.acc 
= TpE.pi.acc + SpE.pi.accl 


(6.139) 


And: 


TpE,P2,RCV,E,acc — TpE.pi.SND.acc + RpP 


— TpE.Pl.acc + SpE.Pl.accImin + RpP 


(6.140) 


63.4.2. Computation in process P2 

Let CpE,p 2 ,acc denote the computation delay in process P2 for the accusations message. The computation 
delay is measured from the end of the deskewing window. Let T PE ,P2,c,acc denote the local-time at which 
the Computation Process outputs the result. 

TpE,P2,C.acc — TpE,P2,RCV,E,acc + Woeskew.post + CpE.P2.acc (6.141) 

6.3.5. Duration of the protocol 

Let A PE ,P 2 ,acc,c-END denote the delay in process P2 from the end of the computation to the end of the PE 
Communication mode protocols. A PE ,p 2 ,acc,c-END is assumed to be a constant independent of the system 
state. 

Let Tpe,end denote the local-time at which the execution of the PE Communication protocols ends. 
TpE.END — TpE,P2,C.acc + A P E,P2,acc,C-END (6.142) 

Let A pe denote the duration of the PE Communication protocol. 

Ap E = Tpe.end - T P e (6. 143) 


6.3.6. Bound on the number of PE messages 

The execution of the PE Communication protocol must end at or before the time at which the 
execution of the Synchronization Preservation protocol is scheduled to begin. Let T S p denote the local- 
time trigger for the execution of the Synchronization Preservation protocol. If no additional time is 
needed from the end of the PE Communication to the beginning of the Synchronization Preservation 
protocol, the following relation expresses the constraint on the duration of the PE Communication 
protocol. 

TpE.ENDlmax = Tsp (6. 144) 

So: 

Ape Imax — Tsp - Tpe (6. 145) 


102 



To compute the maximum number of PE messages that can be processed, we need to determine the 
time consumed in processing the accusation messages. K PE , sche dlmax denotes the maximum number of 
scheduled PE messages that can be processed, and A PEjacc denotes the time delay from the reference time 
for the transmission of the last scheduled PE message from PO to PI to the end of the protocol. 

A PEj acc — T PEE nd - [T PE + (K PE sc hed " 1 )A PEjSC h e d] 

— S Ph.PO.sched "t" R PP + W Deskew.post "t" Csu,Pl,sched 

+ A PEPEacc _ rea dy + S PE , P i, aC c + Rpp + Deskew, post + Cp Ej p2, acc + A PEjP 2, acc ,C-END 
— (Sp E ,P0,sched + Sp E ,pi, acc ) + 2(R PP + Woeskew.post) 

(Csu,Pl,sched "t" C PE>P 2, a cc) (A PE>PEacc _ rea( jy + Appp2 acc C- end) (6.146) 


So: 


K, 


PE.schecPmax 


= L(t SP " Tpe - Ap Ei acc)/ A PE sc hedJ 1 


(6.147) 


6.3.7. PE message latency 

Let A PEj i a tency denote the latency for PE messages. For this analysis, we define the PE message latency 
as the time from the reference time for the transmission of the i-th scheduled PE message by process PO 
until the completion of the corresponding computation in process P2. So: 

Ap E) l a tency = T PEP 2,c.sched,i " T PE , P 0-Pl,R E F,sched,i (6.148) 

— (Sp Ei P0,sched + Sp E ,pi,schedlmin) + 2(Rpp + WDeskew.post) + (Cp E ,Pl,sched + Cp E> p2,sched) (6.149) 


6.4. Synchronization Preservation Protocol 

Section 5 of this document examines the timing aspects of the Synchronization Preservation protocol. 


6.5. Miscellaneous considerations 


6.5.1. Time gap between Sync Reset and Collective Diagnosis 

Acd, begin denotes the time from the synchronization reset to the beginning of the Collective Diagnosis 
protocol, measured in local clock ticks. 

Acd, begin — Tcd (6. 150) 

If needed, A C d, begin can be chosen to ensure that the delay from the send time of the ECHO message in 
Initial Synchronization or Synchronization Preservation to the send time of the first message in Collective 
Diagnosis satisfies the minimum data introduction interval for the Communication Module. At the 
RMUs, which are the last to send ECHO messages, this constraint corresponds to: 
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Tcd.PO, SND - (Tp 3 ,A + Bp 3 ) > Acomm 


(6.151) 


Or: 

Hp 3 - Bp 3 + Tcd + Scd.PO ^ Acomm (6.152) 

6.5.2. Time gap between Collective Diagnosis and Schedule Update 

Asu.begin denotes the time from the end of the Collective Diagnosis protocol to the beginning of the 
Schedule Update protocol, measured in local clock ticks. 

Asu.begin = Tsu - Tcd - Acd (6.153) 

If needed, Asu.begin can be chosen to ensure that the delay from the send time of the last Collective 
Diagnosis message to the send time of the first Schedule Update message satisfies the minimum data 
introduction interval for the Communication Module. At the BIUs, this constraint corresponds to: 

Tsu.PO.SND.O - Tcd, P3, SND ^ Acomm (6. 154) 

At the RMUs, the constraint is: 

Tsu,P1,SND,0 - Tcd, P3, SND ^ Acomm (6.155) 

6.5.3. Time gap between Schedule Update and PE Communication 

Ape, begin denotes the actual time delay from the end of the Schedule Update protocol to the beginning of 
the PE Communication protocol, measured in local clock ticks. 

Ape, begin = Tp E - Tsu - Asu (6.156) 

If needed, A PE , begin can be chosen to ensure that the delay from the send time of the last Schedule 
Update message to the send time of the first PE Communication message satisfies the minimum data 
introduction interval for the Communication Module. At the BIUs, this constraint corresponds to: 

Tpe.PO.SND.O - Tsu,P2,SND,N-1 ^ Acomm (6.157) 

At the RMUs, this constraint is: 

TpE.Pl, SND, 0 ■ Tsu ,P3,SND,N-1 — Acomm (6.158) 

6.5.4. Time gap between PE Communication and Synchronization Preservation 

Asp.begin denotes the actual time delay from the end of the PE Communication protocol to the beginning 
of the Synchronization Preservation protocol, measured in local clock ticks. 

Asp.begin = Tsp - Tpe - A PE (6. 159) 
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If needed, A S p,b eg in can be chosen to ensure that the delay from the send time of the last PE 
Communication message to the send time of the first Synchronization Preservation message satisfies the 
minimum data introduction interval for the Communication Module. At the BIUs, this constraint 
corresponds to: 

(Tsp + Bsp,po) - T P E,P2,SND,N- 1 — Acomm ( 6 . 160 ) 

At the RMUs, this constraint is: 

(Tpe,P1,RCV,E - App,RCvlabs-max + Api + Bpj) - TpE^SND.N-l - ^Comm ( 6 . 161 ) 
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7. Self-Test mode 


This section examines the timing aspects of the Self-Test mode. The main objective is to determine a 
bound on the relative time skew at the end of this mode. 


7.1. Bound on the relative time skew at the beginning of the Self-Test mode 

An RPP enters the Self-Test mode for startup after receiving a power-on enable, or for restart after the 
detection of a local failure or a failure of the clique. 


7.1.1. Power-one enable 

It is assumed that the nodes enter the Self-Test mode immediately after power-on enable. 5 P oe denotes 
the actual duration of the time interval within which the nodes are enabled measured in units of seconds. 
§ POH l m;ix denotes the upper bound for 8 PO e- Ttpon denotes the upper bound on the relative time skew at 
power-on enable, measured in nominal clock ticks. x 0 denotes the nominal duration of a clock tick 
measured in seconds. 5 PO e and 7t P0E are related as follows: 

ftpOE = SpOElmax/'to (7- 1 ) 


7.1.2. Local failure or bus failure 

5 F cp denotes the actual duration of a fault-causing phenomenon measured in units of seconds. We 
assume that the duration of the fault-causing phenomenon as experienced by individual nodes can be 
effectively 0. (8 F cp = 0 means that the phenomenon has a negligibly small duration, not that the 
phenomenon has no effect.) 

SpCpImin - 0 (7.2) 

Spcplmax depends on the characteristics of the fault-causing phenomenon for which the design is targeted. 

A fd denotes the actual duration of the failure-detection delay measured in local clock ticks. We 
assume that it is possible for a node to detect a failure condition immediately. A FD l IT ,ax is implementation- 
dependent. 8 fd denotes the actual duration of the failure -detection delay measured in nominal clock ticks. 


= 0 

(7.3) 

= ( 1 + Po)A F Dlmax 

(7.4) 


Let t FCP ,o denote the time at which the fault-causing phenomenon begins. Let t lcstarL i and t rest art,h denote 
the earliest and latest times, respectively, at which nodes affected by the fault-causing phenomenon enter 
the Self-Test mode. 

trestart,l — tpCP.O + Spcplmin + S F Dlmin 
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— trcp.o 


(7.5) 


And: 

t restart, h = t F cp,0 + SpCpImaxAto + (1 + P())AFDlmax (7-6) 

Let Tlrestart denote the upper bound on the relative time skew when entering the Self-Test mode for 
restart, measured in nominal clock ticks. 

tt re star! — t re start.h “ ties tart, I 

= S F Cplinax/'to + (1 + po)A FD l max (7.7) 


7.2. Duration of the Self-Test mode 

The Local Upset Abatement Delay (LUAD) for a transient-fault scenario is defined as the delay from 
the time the fault-causing phenomenon reaches a node until the node has regained control of its local 
operation. Local regaining of control is assumed to occur after the node has detected the failure 
condition, at which time the node disables its broadcast outputs and transitions to the Self-Test mode. In 
the Self-Test mode, the node first performs a full local reset and then begins the execution of the self-test 
procedure. This local reset activity should cover the Communication and Computation modules. The 
duration of the reset is implementation-dependent. Let 5 LUA d denote the actual duration of the Local 
Upset Abatement Delay, measured in units of nominal clock ticks. 

SnjAI)lmax — beslart.h " IfCP.O 

= ^FCpImax/Xo + (1 + P())AfdI max (7.8) 

The Observed Upset Abatement Delay (OUAD) for a transient-fault scenario is defined as the delay 
from the time the fault-causing phenomenon begins until the affected nodes can be consistently 
recognized by all direct observers as being untrustworthy. Note that this delay is defined with respect to 
the effects perceived by the observing nodes. Thus, the message reception delay must be taken into 
consideration. Let 8ouad denote the actual duration of the Observed Upset Abatement Delay, measured in 
units of nominal clock ticks. 

^OUADlmax I .U A 13 1 max fpp.h 

— SFCplinax/'to + (1 + Po^Folmax + r PP,h (7-9) 

A STM denotes the duration of the Self-Test mode for a ROBUS node, measured in units of local clock 
ticks. A STM is assumed constant. The duration of the Self-Test mode must satisfy the timing requirements 
for the expected transient-fault scenarios. To increase the probability that a restarting node does not trust 
an affected node, we require that the restarting nodes exit the Self-Test mode only after the latest time at 
which affected nodes can be incorrectly diagnosed as trustworthy. 

trestart,l + AstmAT + Po) ^ trestart,h + fpp,h 

AstmAT + p 0 ) > 

^restart + r PP,h 
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Astm/( 1 + po) — Squad I max 


(7.10) 


In terms of local clock ticks, the above inequality corresponds to the following constraint: 

AstM ^ r (1 + Po)SouADlmaxl (7. 1 1) 

7.3. Bound on the relative local-time skew at the end of the Self-Test mode 

71stm denotes the upper bound on the relative time skew at the end of the Self-Test mode, measured in 
nominal clock ticks. 

7IstM — niax(7tpOE. ^restart) + [( 1 + Po) " 1/(1 + Po)]AsTM (7-12) 
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8. Clique Detection mode 


The Clique Detection mode is composed of three main processes: Local Diagnosis Acquisition, 
Synchronization Acquisition, and Collective Diagnosis Acquisition. 


8.1. Local Diagnosis Acquisition 

Local Diagnosis Acquisition (a.k.a., Preliminary Diagnosis) is composed of two consecutive 
observation intervals, each with a duration at least as large as a resynchronization interval. A PDbegin 
denotes the delay from the time a node exits the Self-Test mode until the beginning of the first 
observation interval, measured in local-clock ticks. The value of A PD , beg j n is determined by the 
implementation and assumed constant. 


8.1.1. Bound on the duration of an observation phase 

It is assumed that, at the earliest, a node in Local Diagnosis Acquisition can detect the absence of a 
valid clique as soon as it enters the observation phase. A PD O w denotes the duration of the observation 
intervals (or “windows”), measured in local-clock ticks. The value of A PDO w is determined by the 
implementation and assumed constant. A PDO w should be large enough to cover the duration of a 
resynchronization cycle measured in local-clock ticks. 

A P d,ow - P (8-1) 

P is the nominal resynchronization period given in Section 5 of this document. 


8.1.2. Bound on the duration of Local Diagnosis Acquisition 

Let 5 pd denote the actual duration of Local Diagnosis Acquisition measured in nominal clock ticks. 


- A PD , be gi n /(l+Po) 

(8.2) 

= (l+po)(A PDjbeg i n + 2A PDj ow) 

(8.3) 


8.2. Synchronization Acquisition 

Synchronization Acquisition is composed of the Frame Synchronization and Synchronization Capture 
protocols. Synchronization Acquisition ends with the synchronization reset, at which point the local time 
is set to 0. 


8.2.1. Frame Synchronization 

It is assumed that a node can detect the absence of a valid clique at any time during Synchronization 
Acquisition. A FSbegin denotes the delay from the end of the second observation window during Local 
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Diagnosis Acquisition to the beginning of the Frame Synchronization protocol during Synchronization 
Acquisition, measured in local clock ticks. A FS begin is implementation-dependent and assumed constant. 

A FS denotes the actual duration of the execution of the Frame Synchronization protocol measured in 
local clock ticks. A FS l max is given in Section 5 of this document. 8 FS denotes the actual duration of the 
Frame Synchronization protocol measured in nominal clock ticks. 

^FS^max — (i + pokW max (8.4) 


8.2.2. Synchronization Capture 

We assume that the Synchronization Capture protocol begins immediately after the Frame 
Synchronization protocol is complete. We want to determine a bound on the duration of Synchronization 
Capture protocol. 

8sc denotes the actual duration of the execution of the Synchronization Capture protocol measured in 
nominal clock ticks. The execution of the protocol may begin shortly after the ECFIO messages are 
transmitted by the clique during the execution of the Synchronization Preservation protocol. In that case, 
the end of the Synchronization Capture protocol would occur after the reset is applied during the next 
execution of the Synchronization Preservation protocol. To specify a bound for the duration of the 
Synchronization Capture protocol, we consider an interval containing two consecutive executions of the 
Synchronization Preservation protocol. 5spl m ax denotes the upper bound on the real-time duration of the 
execution of the Synchronization Preservation protocol. 8 SP l max is given in Section 5 of this document. 
T SP denotes the scheduled local time at which the execution of the Synchronization Preservation protocol 
begins. 

8s(’lmax Pmax "t" 8spl max 

= (1 + Po)Tsp + 28splmax (8-5) 


8.2.3. Bound on the duration of Synchronization Acquisition 

Let 8 sa denote the actual duration of the execution of Synchronization Acquisition measured in 
nominal clock ticks. 

^S A lmax (l+Po)A F s , begin ^FS^max ^SC^max (8.6) 

A S a denotes the actual duration of Synchronization Acquisition measured in local clock ticks. We 
want to ensure that a count of A SA l max local ticks takes no fewer than 8 S Almax nominal ticks. 

AsAlmax/(l+Po) ^ SsaI max (8.7) 

We choose the minimum value that satisfies that constraint. 

AsAlmax — r(l + po)Ss A lmaxl (8-8) 
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8.3. Bound on the duration of the Clique Detection mode 

Synchronization Acquisition ends with the synchronization reset, at which point the local time is set to 
0. From that point on, the local time should be synchronized to the clique in Preservation mode. The 
delays to begin and complete the Collective Diagnosis Acquisition protocol in the Clique Detection mode 
are the same as for the Collective Diagnosis protocol in Clique Preservation mode. A C d, begin denotes the 
time from the synchronization reset to the beginning of the Collective Diagnosis protocol, measured in 
local clock ticks. A C d denotes the time to complete the execution of the Collective Diagnosis protocol in 
local clock ticks. The transition to the Clique Join mode occurs at the beginning of execution of the 
Schedule Update protocol. Before that point, a detected failure attributable to the absence of a clique 
results in a transition to the Clique Initialization mode. A subegin denotes the time from the end of the 
Collective Diagnosis protocol to the beginning of the Schedule Update protocol, measured in local clock 
ticks. AcD.begin. A C i), and A S u,be g in are implementation-dependent and determined by the time -indexed 
operation schedule specifying the timing for bus activities. 

After detecting the absence of a valid clique, a node clears its state and transitions to the Clique 
Initialization mode. A C dm-cim denotes the delay to transition to the Clique Initialization mode after 
detecting the absence of a valid clique, measured in units of local clock ticks. A C dm-cim is 
implementation-dependent and assumed constant. 5cdm denotes the actual duration of the Clique 
Detection mode for a ROBUS node, measured in units of nominal clock ticks. 


ScDMlmin - Spolmin + AcdM-CImAT + Po) (8-9) 

ScDMlmax = SpoUax + ^SA^ax + [(1+Po) (AcD.begin + Acd + Asu.begin + A C DM-CIm)] (8.10) 
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9. Clique Initialization mode 


This section examines timing aspects related to the Clique Initialization mode. Only Initial Diagnosis 
and Initial Synchronization are discussed. The operation during Collective Diagnosis is the same as 
during Clique Preservation mode, which is discussed in Section 6. 


9.1. Bound on the relative time skew at the beginning of the Clique Initialization mode 

7tci\i.BKGiN denotes the upper bound on the relative time skew at the beginning of the Clique 
Initialization mode, measured in nominal clock ticks. 

ftciM.BEGIN — TtS'l VI “t" (^CDlVlImax " ^CDIVlImin ) (9-1) 

7tsTM is defined in Section 7, and dcDivilmax and dcDMlmin are defined in Section 8 of this document. 


9.2. Initial Diagnosis 

To simplify the presentation, we would like to compute a single upper bound for the relative local- 
time skew during the execution of the Initial Diagnosis and Initial Synchronization protocols. Let 7t ID+IS 
denote that bound, measured in nominal clock ticks. 

Figure 9.1 shows the message flow graph for Initial Diagnosis. BIUs and RMUs are assumed to have 
the same timing characteristics. The analysis presented here does not refer to the kind of the node 
sending or receiving messages for any of the protocol processes. 



Figure 9.1: Message flow graph for the Initial Diagnosis protocol 

Aid, begin denotes the delay from the time a node enters the Clique Initialization mode until the time it 
begins the execution of the Initial Diagnosis protocol, measured in units of local clock ticks. The value of 
Aid, begin is determined by the implementation and assumed constant. 


9.2.1. Communication between processes PO and PI 

The following variables are defined: T ID denotes the local time triggering the execution of the Initial 
Diagnosis protocol; Ti D , PO - P i,ref denotes the reference time for the communication between processes PO 
and PI; T id , p0 ,snd denotes the time at which process PO sends the message; T id , P i, RC v,e denotes the 
expected time of reception in process PI; Si D , P o denotes the Send Process delay for process PO; 
Aid,pi,rcvwnd denotes the delay from the communication reference time to the opening of the reception 
window in process PI; R PP denotes the nominal point-to-point reception delay; W IDDeskew denotes the size 
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of the deskewing window in process PI; W IDDeskewpre denotes the pre -expectation window in process PI 
(i.e., the size of the section of the deskewing window before the expected time of reception); W IDiDes kew, P ost 
denotes the post-expectation window in process PI (i.e., the size of the section of the deskewing window 
after the expected time of reception); A ID ,pp,Rcvlabs-max denotes the absolute value of the maximum error in 
the actual time of reception in process PI for a good source-receiver pair; C ID ,pi denotes the computation 
delay in process PI (The computation delay is measured from the end of the deskewing window. C C d,pi is 
assumed constant.); A ID P1 C -end denotes the delay in process PI from the end of the computation to the end 
of the execution of the Initial Diagnosis protocol (A id ,pi,c-end is assumed constant.); and A ID denotes the 
duration of the execution of the Initial Diagnosis protocol. 

T 1D is the reference time for the communication between processes PO and PI. Given that the local 
time is reset at the start of the Clique Initialization mode, then: 

Tid.P0-P1.REF — Tid = AjD.begin (9.2) 

To determine W IDiDeskew , we need the maximum error in the expected time of reception for the Initial 
Diagnosis protocol messages, A IDPP RCV l a bs-max- Based on the analysis presented for point-to-point 
communication: 

AlD,PP,RCvlabs-max — L(1 + PoX^id+is + max (M-pp,i • M-pp,h))J ( 9 - 3 ) 

Ppp.i and p PP h are given in Section 4 of this document. For the deskewing window: 


Wid, Deskew — 2AiD,PP,RCvlabs-max + 1 

(9.4) 

^^ID, Deskew, pre ~ ^ID,PP, RCVlabs-max 

(9.5) 

ID, Deskew, post — AiD,PP,RCvlabs-max 1 

(9.6) 


We expect the upper bound on the relative local-time skew during the execution of the protocol to be 
much larger than any minimum timing constraints associated with the process of communication. Based 
on this, we assume that the following condition holds for the communication between processes PO and 
PI. 


SlD.Polmin + R PP < AlD.Pl.RCVWNDlmin + W ID 

, Deskew, pre (9.7) 

For this case: 

SlD.PO — AiD,Pl,RCVWNDlmin + W ID ,Deskew,pre - R PP (9.8) 

And: 

Aid.PI.RCVWND — A id P i , R CVWNDlmin (9.9) 

So: 


Tid .PO.SND — Tid,P 0-P1,REF + SlD.PO 

= Tid + Aid.P1 .RCVWNdI min + W ID ,Deskew,pre " RpP 


(9.10) 
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And: 


TlD.Pl.RCV.E — TlD.PO.SND + RpP 

= Tid + AlD.Pl.RCVWNDlmin + W ID 

, Deskew, pre (9.11) 

9.2.2. Bound on the duration of the Initial Diagnosis protocol 

Let T ID pi c denote the local-time at which the Computation Process outputs the result for process PI. 
Tid.pi.c = Tid ,P1,RCV,E + WiD.Deskew.post + Cid.P1 (9.12) 

Let T IDiP1END denote the local-time at which the execution of the Initial Diagnosis protocol ends. 
Tid,pi.end — Tid.pi.c + Aid,pi,c-end (9.13) 

The duration of the execution of the Initial Diagnosis protocol is: 

Aid = Tid.pi, end "Tid 

= AlD.Pl.RCVWNDlmin + Wi D , Deskew + Qd.P 1 + Ai D ,pi,C-END (9.14) 

9.3. Initial Synchronization 

Let Tis denote the local time triggering the execution of the Initial Synchronization protocol. Ai S , begin 
denotes the delay from the end of Initial Diagnosis to the beginning of Initial Synchronization, measured 
in units of local clock ticks. The value of Ais, begin is determined by the implementation and assumed 
constant. 

Ais, begin = Tig - Tid, pi , end (9.15) 

9.3.1. Bound on the relative skew at the beginning of the Initial Synchronization protocol 

Let 7ti S , BEGIN denote the upper bound on the relative local-time skew at the beginning of the Initial 
Synchronization protocol, measured in nominal clock ticks. 

ftlS, BEGIN — ftciM.BEGIN + [( 1 + Po) " 1/(1 + Po)](AlD,begin + A JD + Ai S , be gin) (9-16) 

9.3.2. Communication between processes PO and PI 

This is discussed in Section 5 of this document. There. 7ti S denotes the bound on the relative skew 
during the execution of the Initial Synchronization protocol. Thus: 

Jtis = TCid+is (9-17) 
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9.3.3. Bound on the duration of the Initial Synchronization protocol 

Sislmax denotes the upper bound on the real-time duration of the execution of the Initial Synchronization 
protocol measured from the earliest time at which a node begins executing the protocol to the latest time 
at which a node applies the synchronization reset. 5 ISjSync l max , A IS ,P2, H ,h , A IS , P3 H h , and A| S F4 H h are given by 
SsynJ rriaxs A S y nc p 2 ) H,h 5 A S y nc p 3 iP [.h 5 and A S y nc p 4 H,h in Section 5 of this document with B P o replaced by B^po. 

Sislmax = ^IS, BEGIN + max(A ISi P2,H,h , A ISjP 3 jH-h , A IS ,p 4 ,h, h) ( 9 . 18 ) 

Let A is l ma x denotes the upper bound on the duration of the execution of the Initial Synchronization 
protocol measured in local clock ticks. We want the fastest count of A is l max ticks to be greater than or 
equal to 5i S l max . 

Aislmax /(1 + Po) - Sislmax ( 9 - 19 ) 

We choose the following value for A is l m ax- 

Aislmax = f (1 + Po) Sis I maxi (9.20) 


9.4. Bound on the relative skew during Initial Diagnosis and Initial Synchronization 

The bound on the relative local-time skew during Initial Diagnosis and Initial Synchronization is: 
ftlD+IS ~ ^IS, BEGIN + [(1 + Po) - 1/(1 + Po)] Sislmax (9-21) 

The following variables are defined in order to simplify the expressions presented below. 

Xis,H,h = nrax(Ais iP 2,H,h ? Ais,P3,H,h> Ais,P4,H,h) (9.22) 

c 0 =[(l +p 0 )- 1/(1 +p 0 )l (9.23) 

Then: 

7tlD+IS = ftlS, BEGIN + Oo(tt|S, BEGIN + Xis.H.h) 

— OqXish.Ii + ( 1 + <7o)7tlS, BEGIN 

— 0(XlS,H,h + (1 + tTo) [ttciM.BEGIN + <^o(- A id, begin + A ID + A ISi begin)] 

= C?(XuS,H,h + (1 + tTo) [ttciM.BEGIN + C?o(Ai Dj begin + Ais.begin)] + Ct()(l + <T))Aid (9.24) 

The following inequality holds for W ID Deskew : 

A^ID. Deskew < 2(1 + po)[ttiD+is + max((j, P pi, flpp.h)] + 1 (9-25) 

Applying this inequality to A ID , then: 

7tlD+IS — °oX|S.H,h + (1 + <7o)[7tciM,BEGIN + <^o(Aid , begin ^IS, begin)] 
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+ <T()(1 + CT()){ A ID p 1R cvWNDlmin + Qd.P1 + A ID . P1 C-END 
+ [2(1 + po)(JtiD + is + max(p,p P .i, (X PP ,h)) +1]} 

Again, the definition of the following variable simplifies the presentation. 

Y = 0()X ISi H.h + (1 + <Q)[ftciM, BEGIN + C?o(AiD,begin + Ais.begin)] 

+ 0()(1 + tT()) ( Aid.p 1 ,RC VWND I min + Qd.P1 + Aid,P1,C-ENd) 

So: 

^ID-t-IS - Y + a 0 (l + Oo)[2(l + PoX^ID+IS + max (ppp,l- [tpp.h)) + 1] 

^id+is - {Y + a 0 (l + <Jo)[2(l + po)max(p,p P .i, [tpp.h) + 1] }/{ 1 - 2<7o(l + cr 0 )(l + po) } 
We choose the right side of this expression as the value for 7Ti D +is- 

ftm+is — {Y + a 0 (l + cjq) [ 2(1 + po)max(p PP .i, (ipp.h) + 1]}/{1 - 2<Jo(l + <7 q)( 1 + po)} 


(9.26) 


(9.27) 


(9.28) 

(9.29) 
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10. RPP requirements 


ROBUS-2 is a developmental version of ROBUS intended for laboratory experimentation and 
capability demonstrations. The RPP is the component that realizes the characteristic functionality of 
ROBUS. The ROBUS-2 RPP shall implement the behavior summarized in Section 2 of this document 
and described in detail in [Torres 05]. The design of the RPP shall comply with the following additional 
requirements. 

• The Self-Test major mode will not include an actual self-test of the RPP or the FCR. The main 
reason for this requirement is to simplify the design of the RPP. The exclusion of self tests increases 
the likelihood that a faulty node will exit the Self-Test mode and eventually send bad messages to 
trustworthy nodes. Given the fault tolerance capabilities of ROBUS. the resulting increase in the 
probability of system failure should be small. That is an acceptable trade-off for this version of the 
RPP. 

• Input-error monitoring shall be performed concurrently with the execution of the Frame 
Synchronization, Synchronization Capture, and Initial Synchronization protocols. This monitoring 
shall be independently performed for each opposite-kind node. Error detection shall be based on the 
validity of the received message sequence in the context of the protocols being executed. The 
detection of an error for a particular input source shall result in an immediate accusation. 

• The error syndromes generated by the Frame Synchronization protocol (see [Torres 05]) shall be 
considered in the determination of the set of eligible voters for the Synchronization Capture protocol. 
This requirement is meant to ensure that the Synchronization Capture protocol is started with the 
latest available diagnostic information. 

• An error check shall be implemented to corroborate the validity of the eligible voters latched at the 
beginning of the execution of the Synchronization Capture and Initial Synchronization protocols. 
These protocols are defined with the assumption that a majority of the latched eligible voters are 
trustworthy. This check shall consist of a comparison of the number of eligible voters at the 
beginning of the protocol execution against the number of eligible voters at the end. A failure shall 
be declared if a majority of the initial eligible voters are not in the final set of eligible voters. The 
determination of the final set of eligible voters shall include the error syndromes generated by the 
concurrent input-error monitors and the cross-lane checks described in the definition of the protocols 
(see [Torres 05]). 

• A timeout check shall be performed for the Synchronization Acquisition and Initial Synchronization 
protocols. These protocols depend on receiving the expected synchronization messages from a 
majority of the eligible voters in order to complete their execution. A timeout error shall result in the 
immediate termination of the protocol execution and the declaration of a failure. 

• A timeout check shall be performed for the resynchronization period. The synchronization period is 

the time interval between consecutive synchronization-reset events at a BIU or RMU node generated 
by synchronization protocols. Three sequences of synchronization protocols define resynchronization 
intervals: from Synchronization Capture to Synchronization Preservation, from Initial 

Synchronization to Synchronization Preservation, and from one execution of Synchronization 
Preservation to the next. 
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• The RPP shall be capable of processing Schedule Update messages with a data-introduction interval 
greater than or equal to one local oscillator clock tick. 

• For the PE Broadcast protocol, all the BIUs shall broadcast a message for each execution of the 
protocol. The scheduled BIU shall broadcast a message from its attached PE, and all other BIUs shall 
broadcast a diagnostic message corresponding to the current local accusations against RMU nodes. 

• The RPP shall be capable of processing PE Broadcast messages with a data-introduction interval 
greater than or equal to one local oscillator clock tick. 

• For BIUs and RMUs the time interval between the last scheduled PE Broadcast message and the 
message for the Accusation Exchange protocol shall be the same as the data-introduction interval for 
the PE Broadcast messages. So, if there are K sched scheduled PE messages, each BIU and RMU is 
expected to transmit K sched + 1 messages with a constant DII during the PE Communication mode. 

• The RPP shall be able to behave as either a BIU or an RMU. The desired node kind shall be 
selectable at the RPP external interface. 

• The RPP external interface shall include input reset and enable control signals, as well as an output 
signal indicating when the RPP has detected a local or bus failure. This feature enables the 
implementation of a coordinated reset for all the circuitry within a particular FCR. 

• The node identification number shall be selectable at the RPP external interface. 

• The RPP shall be described in the VHDL hardware description language. 

• The RPP VHDL description shall be highly parameterized in the behavioral and structural domains. 
This feature enables the reuse of the RPP description for a wide range of ROBUS-2 system 
characteristics. 

• The synthesized RPP shall fit on a small to medium size FPGA (Field-Programmable Gate Array). 
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11. RPP design description 


This section describes the design of the ROBUS-2 protocol processor, including insight into the 
synthesis process, the top-level design, and the design of the RPP sub-units. 

11.1. High-level design concepts 

This section presents some of the ideas that influenced the design of the RPP. 

11.1.1. Functional partitions 

The organization of the RPP is determined by the partitioning of functions along several lines. The 
first is a separation of operational and diagnostic functions. As illustrated in Figure 11.1, the RPP is 
composed of two separate but coordinated systems. The operational system handles the communication 
and computation activities required by the distributed protocols, in addition to all the processing 
associated with the mode logic, local time, and PE communication schedule. The diagnostic system 
monitors the operational system and provides timely information for reconfiguration and error 
containment. To support the fault-tolerance mechanisms, the diagnostic processes must work in close 
coordination with the operational system processes. The basic functions of the diagnostic system are to 
detect errors during the execution of the protocols, assess the status of individual nodes, and assess the 
status of the bus. 



Observations 


Operational 

► 

Diagnostic 

system 

◄ 

Node and 

system 


bus diagnosis 

Figure 11.1: RPP functional partition: operations and diagnostics 

The RPP is further partitioned into reception and transmission functions, as illustrated in Figure 11.2. 
The ROBUS protocols require that the nodes be able to simultaneously send and receive messages. The 
data handled by the nodes includes node diagnoses, local-time synchronization events, PE- 
communication schedule messages, and PE messages. For all the protocols, the reception of messages is 
followed by a computation function determined by the protocol processes being executed. The 
computation results can be passed forward for transmission or stored locally, depending on the protocol 
being executed. The reception and transmission functions must be coordinated with respect to timing 
events or the local time, depending on whether a synchronization protocol or a synchronous protocol is 
being executed. 
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Figure 11.2: RPP functional partition: reception and transmission 


Most of the ROBUS protocols exhibit a high degree of symmetry in the behavior of BIUs and RMUs. 
One of the main distinguishing factors between BIUs and RMUs is that the BIUs have external 
connections to the PEs as well as to the RMUs, while the RMUs communicate only with the BIUs. Since 
the same RPP design must work as either a BIU or an RMU, the RPP must include functionality to 
communicate with the PEs. As shown in Figure 11.3, the RPP is partitioned into bus functions and PE 
interfacing functions. The bus-functions section handles all the functionality for communication between 
BIUs and RMUs, and its implementation exploits the inherent symmetry of the protocols. The PE 
interface is active only when the RPP performs as a BIU. 

Protocol 



Messages 


Figure 11.3: RPP functional partition: bus functions and PE interface 

The activities of the bus are organized in a hierarchy with the following levels: major mode, minor 
mode, protocol, (protocol) process, and (process) step. The major and minor modes are high-level, 
node management states that specify how a node is supposed to interact with the other nodes on the bus. 
The transitions for major and minor modes depend on the local time, computed synchronization events, 
and the diagnostic assessment of the local node and the clique. For a given value of the major and minor 
modes, a node must execute a particular protocol or sequence of protocols. The implementation of the 
protocols requires intricate signaling sequences to control the operation of the functional units that 
perform the actual data processing. As illustrated in Figure 11.4, the RPP control functions are 
partitioned into two groups. The mode control section handles the high-level control functions. The 
protocol control section manages the execution of the protocols. The command from the mode control 
section includes all the high-level state information needed to fully specify the protocol to be executed 
and the diagnostic operations to be performed. 


Mode 

control 
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Protocol 

control 
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Status 



Figure 11.4: RPP functional partition: mode and protocol control 
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For most protocols executed in more than one major mode, the required processing is the same or 
differs only slightly. Many of these major-mode-dependent variations affect the diagnostic system, not 
the operational system. In addition, examination of the required processing for the major and minor mode 
sequences reveals that they can be reduced to a simple protocol execution pattern, as shown in Figure 
11.5. This cyclic pattern with two entry paths, one to merge with an existing clique and the other to form 
a new clique, is the result of the required service delivery sequence and the chosen diagnostic policy for 
ROBUS-2. The transitions to the Collective Diagnosis protocol occur after the synchronization of the 
local time. Once inside the loop, the execution of protocols is triggered by the local time. The blocks in 
Figure 1 1.5 correspond to the redefined minor modes as actually implemented on the RPP. The protocol 
section of the RPP executes all the processing activities corresponding to a minor mode before being 
ready to receive the next command. In addition to the minor mode, the command from the mode control 
section must include all other mode -relevant information needed to properly execute the protocols. 


11.1.2. Distributed pipelining 

According to the requirements, the RPP must be capable of processing messages with a DII of one or 
greater (i.e., a maximum message throughput of one message per local clock tick) for the Schedule 
Update and PE Communication modes. The actual Dll is a behavioral parameter specified before 
synthesis. The selection of the DII depends on factors like the performance of the PE-BIU links, the 
throughput capacity of the communication links between BIUs and RMUs, the bit error rate of the links, 
and desired timing-error coverage at the receiving end of the links. Notice that as the Dll is decreased, 
the reception intervals over which messages are expected to arrive will get closer and eventually overlap. 
The input-timing checks and their effectiveness are necessarily impacted by an overlap in the reception 
intervals. This must be taken into consideration in the design of the RPP. 

The Schedule Update, PE Broadcast, and Accusation Exchange protocols are synchronous protocols. 
The timing of execution of the synchronous protocols is specified by a time -indexed operation schedule. 
The scheduling of operations is based on a distributed synchronous composition abstract model of the 
system in which a single oscillator drives a common local-time clock and the fixed-delay processes 
corresponding to the communication and computation operations of the BIUs and RMUs (see [Torres 
05]). 

For this version of the RPP, the local-time clock and the message processing logic are driven by the 
same physical oscillator. To meet the Dll requirement, the processing is pipelined with a stage delay of 
one clock tick. Because of the deskewing function required for the synchronous communication, the 
actual input-to-output processing delay for any particular message depends on its time of arrival. 
However, the delay from the time a message is read from the input buffer until its processing is complete 
is fixed and determined only by the number of pipeline stages. The reading of messages from the input 
buffers is a time-triggered operation. 
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Figure 11.5: Minor-mode command transitions 


11.2. RPP top-level design 

This section describes the top-level design of the RPP. 


11.2.1. Block diagram 

Figure 1 1.6 shows the block diagram for the RPP. The Mode Control Unit (MCU) handles the high- 
level control functions and the interaction with other control logic within the same FCR. The Input Unit 
(IU) handles the reception of messages, implements the Frame Synchronization protocol, and controls the 
timing of data introduction for the computation pipeline. The Input Diagnostics Unit (IDU) checks the 
content of the received messages and performs the asynchronous monitoring function for the Clique 
Detection and Clique Initialization modes. The Route and Vote Unit (RVU) performs the actual 
computation, including routing scheduled messages for the PE Broadcast protocol and dynamic voting for 
other protocols. The Node Diagnostics Unit (NDU) performs the diagnostic assessment of nodes based 
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on error syndromes generated by the IU, IDU, and RVU. The Output Unit (OU) handles the 
transmission of messages. The Status Monitoring Unit (SMU) assesses the status of the local node and 
the bus. The PE Input Unit (PE_IU) and the PE Output Unit (PE_OU) handle the communication with 
the PE. 


Messages from 



Messages to 
Opposite-Kind Nodes 

Figure 1 1.6: RPP block diagram 
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The MCU, IU, and OU modules have time-driven control functions triggered by the local time or by 
computed synchronization events. The IU and OU include independent schedule processing functions 
that store the computed schedule and perform the assessment. The IU reports the result of the schedule 
assessment to the MCU. 


11.2.2. Interface 

Figure 11.7 shows the VHDL entity declaration for the top level module of the ROBUS Protocol 
Processor. Signal CLK_i is the oscillator-clock signal. The “External inputs” signals correspond to the 
RPP Control Interface. Signal Reset_In_i forces a synchronous reset when set TRUE. Failure_In_i is 
coordinated with Reset_In_i and is set TRUE to indicate that the reset is due to an FCR failure. Signal 
Enable_In_i can be used to hold the RPP in an idle state after Reset_In_i is asserted. When Enable_In_i 
is asserted, the RPP becomes operational and Enable_In_i is ignored until the next time Reset_In_i is 
asserted. Signals Node_Kind_In_i and Node_Id_In_i specify the kind (i.e., RMU or BIU) and 
identification number. Failure_Out_o is asserted when the RPP detects an internal fault or a failure of 
the bus. 

Signal ROBUS_Input_Veet_In_ _i is a VHDL record array with one record for each node of the 
opposite kind. Each record has three fields: Strobe, Message, and Synd. Field Strobe is asserted every 
time a new message is received. Field Message has the content of the received message. Field Synd is a 
vector carrying the error bits generated by the corresponding receiver. 

Signal ROBUS_Output_Out_ _o is a record composed of field Strobe (which is asserted when a new 
message is being sent) and field Message (which has the content of the message). 

The PE Interface has two parts. The interface for PE-to-RPP data flow is composed of two signals: 
PE_Input_i and PE_Read_Out_o. Signal PE_Read_out_ _o is aserted when the RPP is reading the next 
PE message. Signal PE_Input_i is a record with fields Message (the message from the PE) and Synd (a 
Boolean signal which is asserted when an error has been detected in the communication from the PE). 
The interface for RPP-to-PE data flow has only signal PE_Output_o, which is a record with fields 
Message (the content of the message sent to the PE) and Strobe (asserted when a new message is ready). 
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entity RPP_Unit is 


port ( CLK_i 

in bit ; 

— External inputs 


Reset_In_i 

in boolean ; 

Failure_In_i 

in boolean ; 

Enable_In_i 

in boolean ; 

Node_Id_In_i 

in Node_Id_Typ ; 

Node_Kind_In_i 

in Node_Kind_Typ ; 

Failure_Out_o 

out boolean ; 

— Inputs from other ROBUS nodes 

ROBUS_Input_Vect_In_i 

in ROBUS_Input_Vect_Typ ; 

— Outputs to other ROBUS nodes 

ROBUS_Output_Out_o 

out ROBUS_Output_Typ ; 

— Inputs from PE 


— External inputs from the PE 

PE_Input_i 

in PE_Input_Typ ; 

PE_Read_Out_o 

out boolean ; 

— Outputs to PE 


PE_Output_o 

\ 

out PE_Output_Typ 

/ r 

end RPP_Unit ; 





Figure 11.7: VHDL entity declaration for the top-level RPP description 


11.3. Mode Control Unit 

The MCU performs the following functions: handles the interaction with other control logic within the 
same FCR; stores the assigned node kind and node identification number; implements the mode transition 
logic (i.e., major mode, diagnostic cycle, and minor mode transitions); controls the enabling of the send 
output to opposite-kind nodes; and implements the local time function. 


11.3.1. Block diagram 

Figure 11.8 shows the block diagram for the MCU. The Controller is a 12-state Mealy machine 
(FSM) implementing the logic in Figure 11.5. Additional state is stored in the Major Mode, Diagnostic 
Cycle, and Output Status registers. The Local Time timer triggers the issuing of commands after 
synchronization is achieved. The Timer block is used to measure the time during the Self-Test mode 
Astm (see Section 7). The Node Id and Node Kind are loaded directly from the RPP Control Interface and 
forwarded to the other RPP sub-units as part of the set of command signals. 
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Node_Reset 



MCU Command 
(to all units) 


Figure 1 1.8: MCU block diagram 


11.3.2. Interface 

Figure 11.9 shows the VHDL entity declaration for the MCU. The “External inputs” and “External 
outputs” signals correspond to the RPP Control Interface as described previously for the top-level design. 
The signals from the Input Unit indicate whether the currently loaded PE communication schedule is 
valid or not, and whether the available schedule (loaded or default) is zero (i.e., no scheduled messages). 
Signal RVU_Sync_Reset_i indicates when it is time to reset the Local Time and start a new 
synchronization cycle. Signal SMU_Failure_i indicates when a local failure or a clique failure has been 
detected. SMU_No_Clique_i is asserted when no valid clique is detected. The outputs to the other RPP 
units include signals Node_Kind_o and Node_Id_o, which are held constant during the operation of the 
RPP. Signal MCU_Node_Reset_o commands an immediate reset all the RPP units. MCU_Comand_o 
is a record signal with fields Major_Mode (current major mode), Diagnostic_Cycle, Minor_Mode, 
Sehedule_States (valid, zero, or invalid), and Output Enable. Signal MCU_Ready_o is asserted to 
indicate that a new MCU command is being issued. 
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entity Mode_Control_Unit is 



port ( CLK_i 

in 

bit ; 

— External inputs 



Reset_In_i 

in 

boolean ; 

Failure_In_i 

in 

boolean ; 

Enable_In_i 

in 

boolean ; 

Node_Id_In_i 

in 

Node_Id_Typ ; 

Node_Kind_In_i 

in 

Node_Kind_Typ ; 

— External outputs 



Failure_Out_o 

out 

boolean ; 

— From Input Unit 



IU_Zero_Schedule_i 

in 

boolean ; 

IU_Invalid_Schedule_i 

in 

boolean ; 

— From Route and Vote 

Unit 


RVU_S y n c_Re s e t _i 

in 

boolean ; 

— From Status Monitoring Unit 

SMU_Failure_i 

in 

boolean ; 

SMU_No_Clique_i 

in 

boolean ; 

— Output to other units 


Node_Id_o 

out 

Node_Id_Typ ; 

Node_Kind_o 

out 

Node_Kind_typ ; 

MCU_Node_Reset_o 

out 

boolean ; 

MCU_Command_o 

out 

MCU_Cmnd_Typ ; 

MCU_Ready_o 
) ; 

out 

boolean 

end Mode_Control_Unit ; 




Figure 1 1.9: VHDL entity declaration for the Mode Control Unit 


11.4. Schedule Processor 

The functions assigned to the Schedule Processor include storing the schedule computed by the 
Schedule Update protocol, assessing the loaded schedule, and reading and interpreting the current 
schedule. If the currently loaded schedule is invalid, the Schedule Processor uses the default schedule. 
Since the RPP has separate reception and transmission processes (embodied by the IU and OU, 
respectively), each one is given a separate Schedule Processor. 

11.4.1. Block diagram 

Figure 11.10 shows the block diagram for the Schedule Processor. The Controller is a 7-state Mealy 
machine. The output from the RVU is stored in the Schedule Memory, which consists of a FIFO (first-in- 
first-out) buffer. The output from the Schedule Memory includes the source Id and number of scheduled 
messages for each scheduled PE, in addition to the total number of scheduled sources. The Schedule 
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Assessment module determines the validity of the loaded schedule by comparing the number of scheduled 
messages against the maximum allowed and by detecting a PE_ERROR result for any of the sources. If 
the schedule is invalid, the content of the Default Schedule is used. The Message Counter counts the 
number of messages processed for the current source and signals the controller when the last message has 
been reached. The Source Counter is used as a source Id generator during schedule load, and as a source 
counter during schedule execution. 


From RVU 



Schedule Assessment Current-Message Attributes 


Figure 11.10: Block diagram for the Schedule Processor 


11.4.2. Interface 

Figure 11.11 shows the VHDL entity declaration for the Schedule Processor. The user of the 
Processor is either the IU or the OU. The Reset_i forces an immediate synchronous reset. 
SCH_Proc_Cmnd_i is used to command loading or execution of a schedule. SCH_Ready_i is asserted 
when the attributes for the next message are needed. The signals from the RVU include 
RVU_Transfrm_Result_i (the result stream from the Schedule Update protocol), RVU_Ready (asserted 
when a new schedule result is available), and RVU_Last_Msg_i (asserted when the last result has been 
reached). Outputs SCH_Zero_o and SCH_Invalid_o indicate the assessment result for the currently 
loaded schedule. SCH_Source_Id_o indicates the Id for the current source, SCH_PE_or_ACC_o is set 
to PE when the current message is a PE message or to ACC when processing Accusation Exchange 
protocol messages, and SCH_Last_Msg_o is asserted when the last message of the PE Communication 
mode has been reached. 
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entity Schedule_Processor is 

port ( CLK_i 


in 

bit ; 

— From the user of 

this unit 

Reset_i 


in 

boolean ; 

SCH_Proc_Cmnd_i 


in 

Sched_Proc_Cmnd ; 

SCH_Ready_i 


in 

Boolean ; 

— From Route and Vote 

Unit 


RVU_Transf rm_Result_ 

_i 

in 

ROBUS_Msg_Typ ; 

RVU_Re a dy_i 


in 

Boolean ; 

RVU_Last_Msg_i 


in 

boolean ; 

— Outputs 

SCH_Source_Id_o 


out 

Node_Id_Typ ; 

SCH_Last_Msg_o 


out 

boolean ; 

S C H_P E_o r_AC C_o 


out 

PE_or_ACC_Typ ; 

SCH_Zero_o 


out 

boolean ; 

SCH_Invalid_o 
) ; 

end Schedule_Processor ; 


out 

boolean 


Figure 11.11: VHDL entity declaration for the Schedule Processor 


11.5. Input Unit 

The Input Unit performs three main functions: message reception, frame synchronization, and control 
of data introduction for the computation pipeline. The message reception modes include synchronous, 
fixed-delay, and asynchronous-monitoring. The IU performs error detection for each of these modes. 
Appendix A has the detailed IU specification. 


11.5.1. Block diagram 

Figure 1 1.12 shows the block diagram for the Input Unit. The Controller is a 35-state Mealy machine 
that uses the MCU commands, RVU sync events, and the output of the Frame Synchronizer as timing 
references to control the execution of the modules within the IU. Some Pipeline Control signals are 
generated by the Controller and the rest are taken directly from the Schedule Processor. There is one 
Message Receiver for each opposite -kind node. These modules are used for the three reception modes. 
For synchronous reception, the Input Buffer behaves as a FIFO buffer with data being buffered upon 
arrival and read from the buffer at predetermined values of the local times. For fixed-delay reception, the 
sync pulses for INIT and ECHO messages are generated a fixed delay after their reception. For 
asynchronous reception, the Input Buffer performs as a 1-tick delay register. The Error Detector is active 
for all modes of reception. The error checks include link error (received from the communication 
receivers), non-arrival of an expected message, reception of more messages than expected, input reception 
rate too high, and reception of an unexpected message. The Frame Synchronizer implements the frame 
synchronization protocol described in Appendix A. 
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Inputs from Communication Receivers 



Figure 11.12: Block diagram for the Input Unit 


11.5.2. Interface 

Figure 11.13 shows the VHDL entity declaration for the Input Unit. The inputs from the MCU are 
sent directly to the IU controller. Signal ROBUS_Input_Veet_i is a vector of records, with one record 
for each node of the opposite kind. The fields of these records include Message (the message content), 
Synd (the vector of link-error syndromes generated by the corresponding communication receiver), and 
Strobe (which is asserted when a new message is received). Signals IDU_Invalid_Input_i and 
IDU_Ready_i specify the initial set of eligible participants for the Frame Synchronization protocol. The 
inputs from the RVU include the sync event signals RVU_Accept_Init_i and RVU_Aceept_Eeho_i 
(each one generated by the corresponding Accept function), in addition to RVU_Transfrm_Result_i, 
RVU_Ready_i, and RVU_Last_Msg_i, which are used to load the results of the Schedule Update 
protocol. 

The outputs from the IU can be divided conceptually into several groups. Signals IU_Init_o and 
IU_Echo_o represent synchronization events received in the form INT and ECHO messages. These 
events are converted into pulses, which are then used by the RVU to compute the Accept(INIT) and 
Accept(ECHO) functions. Signal IU_Input_Msg_o is a vector containing the received message content 
from each opposite kind node. The pipeline control signals include IU_Input_Strobe_o (for each input 
port, the corresponding vector element is asserted one tick after a message is received), IU_Ready_o 
(asserted at critical timing points in the processing of synchronization protocols, and when it is time to 
process a synchronous-protocol message), IU_Reeeiving_o (asserted when the IU is expecting to receive 
messages), IU_PE_or_ACC_o (taken from the Schedule Processor), IU_Source_Id_o (taken from the 
Schedule Processor), and IU_Last_Msg_o (asserted when the last message in a stream is being processed 
during Schedule Update and PE Communication). Signals IU_Synd_o and IU_Imm_Synd_o are of type 
record and carry the error syndromes. IU_Invalid_Schedule_o and IU_Zero_Schedule_o are the 
assessment results generated by the Schedule Processor. 
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entity Input_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From other ROBUS nodes 

ROBUS_Input_Vect_i 

in ROBUS_Input_Vect_Typ ; 

— From Input Diagnostics Unit 

IDU_Invalid_Input_i 

in OK_Bool_Vect_Typ ; 

IDU_Ready_i 

in boolean ; 

— From Route and Vote 

Unit 

RVU_Accept_Init_i 

in boolean ; 

RVU_Accept_Echo_i 

in boolean ; 

RVU_T r a n s f r m_Re s u 1 1 _i 

in ROBUS_Msg_Typ ; 

RVU_Re a dy_i 

in boolean ; 

RVU_Last_Msg_i 

in boolean ; 

— Output to other units 

IU_Init_o 

out 0K_Sync_Pls_2D_Vect_Typ ; 

IU_Echo_o 

out 0K_Sync_Pls_2D_Vect_Typ ; 

IU_Input_Msg_o 

out OK_Msg_Vect_Typ ; 

IU_Input_Strobe_o 

out OK_Bool_Vect_Typ ; 

IU_Ready_o 

out boolean ; 

IU_Receiving_o 

out boolean ; 

I U_P E_o r_AC C_o 

out PE_or_ACC_Typ ; 

IU_Source_Id_o 

out Source_Id_Typ ; 

IU_Last_Msg_o 

out boolean ; 

IU_Synd_o 

out IU_Synd_Typ ; 

IU_Immd_Synd_o 

out IU_Imm_Synd_Typ ; 

IU_Zero_Schedule_o 

out boolean ; 

IU_Invalid_Schedule_o 
) ; 

out boolean 

end Input_Unit ; 



Figure 11.13: VHDL entity declaration for the Input Unit 


11.6. Input Diagnostics Unit 

The main purpose of the IDU is to identify invalid inputs. The IDU combines the results of its own 
error checks with the error syndromes from the Input Unit to detect invalid inputs. When the RPP is 
unsynchronized and using asynchronous-monitoring communication during Clique Detection and Clique 
Initialization modes, the IDU performs monitoring and error detection for each opposite kind node. The 
monitoring functions include rate checks on the reception of ECHO messages during Local Diagnosis 
Acquisition in the Clique Detection mode, and message sequence checks during Clique Detection and 
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Clique Initialization modes. After the RPP synchronizes to the clique, the IDU performs error detection 
based on the content of received messages. 


11.6.1. Block diagram 

Figure 11.14 shows the block diagram for the IDU. The Controller is an 18-state Mealy machine. 
Each input lane is processed separately. For asynchronous monitoring, the rate and sequence monitoring 
results are combined with the IU reception error syndromes and Frame Synchronization protocol errors to 
determine validity. The Rate Monitor includes a 9-state Mealy machine and a counter. The Sequence 
Monitor includes a 16-state Mealy machine and a counter. For synchronized operations, validity is based 
on content error checks and IU reception error syndromes. 



Figure 1 1.14: Block diagram for the Input Diagnostics Unit 


11.6.2. Interface 

Figure 11.15 is the VHDF entity declaration for the IDU. The inputs include the clock CLK_i, the 
MCU command, and the IU outputs. The IDU outputs include the sync pulses IDU_Init_o and 
IDU_Echo_o; the received messages IDU_Input_Msg_o; and the pipeline control signals 
IDU_Ready_o, IDU_PE_or_ACC_o, IDU_Source_Id_o, and IDU_Last_Msg_o. Signal 
I DIJ_I n valid l nput o specifies the valid inputs, and IDU_Synd_o is the content error syndromes. 
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entity Input_Diagnostics_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From Input Unit 


IU_Init_i 

in 0K_Sync_Pls_2D_Vect_Typ ; 

IU_Echo_i 

in 0K_Sync_Pls_2D_Vect_Typ ; 

IU_Input_Msg_i 

in OK_Msg_Vect_Typ ; 

IU_Input_Strobe_i 

in OK_Bool_Vect_Typ ; 

IU_Ready_i 

in boolean ; 

IU_Receiving_i 

in boolean ; 

I U_P E_o r_AC C_i 

in PE_or_ACC_Typ ; 

IU_Source_Id_i 

in Source_Id_Typ ; 

IU_Last_Msg_i 

in boolean ; 

IU_Synd_i 

in IU_Synd_Typ ; 

IU_Immd_Synd_i 

in IU_Imm_Synd_Typ ; 

— Output to other units 

IDU_Init_o 

out 0K_Sync_Pls_2D_Vect_Typ ; 

IDU_Echo_o 

out 0K_Sync_Pls_2D_Vect_Typ ; 

IDU_Input_Msg_o 

out OK_Msg_Vect_Typ ; 

IDU_Ready_o 

out boolean ; 

IDU_PE_or_ACC_o 

out PE_or_ACC_Typ ; 

IDU_Source_Id_o 

out Source_Id_Typ ; 

IDU_Last_Msg_o 

out boolean ; 

IDU_Invalid_Input_o 

out OK_Bool_Vect_Typ ; 

IDU_Synd_o 

) ; 

out IDU_Synd_Typ 

end Input_Diagnostics_Unit ; 



Figure 11.15: VHDL entity declaration for the Input Diagnostics Unit 


11.7. Route and Vote Unit 

The Route and Vote Unit is where the main computation takes place. The Input Eligible Voters for 
the dynamic voting functions are computed by the RVU using the Invalid Inputs computed by the IDU 
and the Accusations and Convictions provided by the Node Diagnostics Unit. For the synchronization 
protocols, the RVU computes Accept(INIT) and Accept(ECHO) functions. For the Schedule Update, PE 
Broadcast, and Collective Diagnosis protocols the RVU computes exact-match majority word vote 
functions (i.e., the unit of data is the full content of a message). For the PE Broadcast protocol, the RVU 
also performs a routing function that selects an input to be forwarded to the output. For the Collective 
Diagnosis and Accusation Exchange protocols, the RVU computes message-wide Boolean majority bit 
vote functions (i.e., each bit location is voted independently). The RVU performs error checks for each 
dynamic voting function, as well as for the Input Eligible Voters function. 
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11.7.1. Block diagram 

Figure 11.16 shows the block diagram for the RVU. The Controller is a 24-state Mealy machine. The 
Timer is used to measure the synchronization-reset delay for the synchronization protocols. The Input 
Eligible Voters block computes the eligible voters (IEV) for all the dynamic voting functions. This 
module also performs checks for the conditions of no-eligible -voters and invalid-initial-eligible-voters for 
the Synchronization Capture and Initial Synchronization protocols, as stated in the RPP requirements. 
The Accept functions use the widths of the sync pulses to check for no-majority conditions among its 
IEVs and for determining disagreement with the result of the function. The Content Transformation 
block performs transformations required by the protocols beyond the basic dynamic voting and routing 
operations. 


Acc & Conv Invalid Inputs IDU Pipeline MCU 

(From NDU) (From IDU) Control Command 



Selection Transformation Accept(INIT) RVU 

Result Result Pipeline 

Control 


Figure 11.16: Block diagram for the Route and Vote Unit 


11.7.2. Interface 

Figure 11.17 shows the VHDL entity declaration for the RVU. The inputs from the MCU are sent 
directly to the RVU controller. The IDU inputs are used as described above for the block diagram. The 
NDU inputs include the accusations and convictions against the nodes of the same and opposite kinds. 
The RVU outputs include synchronization events (RVU_Accept_Init_o, RVU_Accept_Init_o, and 
RVU_Sync_Reset_o), content results (RVU_Transfrm_Result_o and RVU_Seleet_Result_o), pipeline 
control outputs (RVU_Ready_o, RVU_PE_or_ACC_o, RVU_Source_Id_o, and RVU_Last_Msg_o), 
and the error syndromes (RVU_Synd_o). The RVU syndromes are disagreement with the result of 
Accept(INIT), disagreement with the result of Accept(ECHO), disagreement with the results of the bit 
vote, disagreement with the result of the word vote, no-majority for Accept(INIT), no-majority for 
Accept(ECHO), no-majority for the word vote, no eligible voters, and invalid eligible voters. 
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entity Route_and_Vote_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From Input Diagnostics Unit 

IDU_Init_i 

in OK_Sync_Pls_2D_Vect_Typ ; 

IDU_Echo_i 

in OK_Sync_Pls_2D_Vect_Typ ; 

IDU_Input_Msg_i 

in OK_Msg_Vect_Typ ; 

IDU_Ready_i 

in boolean ; 

IDU_PE_or_ACC_i 

in PE_or_ACC_Typ ; 

IDU_Source_Id_i 

in Source_Id_Typ ; 

IDU_Last_Msg_i 

in boolean ; 

IDU_Invalid_Input_i 

in OK_Bool_Vect_Typ ; 

— From Node Diagnostics Unit 

NDU_Acc_and_Conv_i 

in Acc_and_Conv_Typ ; 

— Output to other units 

RVU_Accept_Init_o 

out boolean ; 

RVU_Accept_Echo_o 

out boolean ; 

RVU_S y n c_Re s e t _o 

out boolean ; 

RVU_T r a n s f r m_Re s u 1 1 _o 

out ROBUS_Msg_Typ ; 

RVU_Re a dy_o 

out boolean ; 

RVU_P E_o r_AC C_o 

out PE_or_ACC_Typ ; 

RVU_Source_Id_o 

out Source_Id_Typ ; 

RVU_L a s t _M s g_o 

out boolean ; 

RVU_Select_Result_o 

out ROBUS_Msg_Typ ; 

RVU_Synd_o 
) ; 

out RVU_Synd_Typ 

end Route and Vote Unit ; 





Figure 11.17: VHDL entity declaration for the Route and Vote Unit 


11.8. Node Diagnostics Unit 

The main puipose of the Node Diagnostics Unit is to diagnose all the nodes in the system using the 
syndromes from the IU, IDU, and RVU. The diagnostics rules are described in detail in [Torres 05]. The 
NDU generates suspicions and accusations against nodes of the same and opposite kinds. It also store the 
convictions results computed during the execution of the Collective Diagnosis and Collective Diagnosis 
acquisition protocols. 

11.8.1. Block diagram 

Figure 11.18 shows the block diagram for the NDU. The Controller is a 26-state Mealy machine. The 
Suspicion Generators generates suspicions based on the syndromes of disagreement with the result of the 
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word vote and the bit vote. The Controller specifies when these syndromes are valid for the generation of 
suspicions. The generated suspicions are stored in the Suspicions Matrix, a 2-dimensional Boolean array 
with one row per node of the same kind and one column per node of the opposite kind. The accumulated 
suspicions are reduced after the execution of the Accusation Exchange protocol by performing row-wise 
and column-wise dynamic bit vote operations for every row and every column. The eligible voters are 
based on the accusations and convictions held by the NDU at the time of the vote. 

During asynchronous-monitoring, the IDU performs the diagnosis of opposite-kind nodes. At the end 
of that communication mode, the diagnostics results are loaded directly from the IDU Invalid_Inputs 
signal as the current opposite-kind accusations. 

The Temporary Accusations Register is used during the execution of the Collective Diagnosis protocol 
to hold the accusations for the previous diagnostic cycle and enable the accumulation of new accusations 
for the next diagnostic cycle while the convictions are being computed. 


RVU RVU IDU IU IDU 


Selection Result Syndromes Syndromes Syndromes Invalid_ Inputs 



To RVU To RVU To RVU, To RVU 

and SMU and SMU SMU and OU and S MU 

Figure 11.18: Block diagram for the Node Diagnostics Unit 
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11.8.2. Interface 


Figure 11.19 shows the VHDL entity declaration for the NDU. The inputs consist of the MCU 
command, the IU syndromes with the relevant timing signals, the IDU syndromes and Invalid_Input 
signal with the timing signals, and RVU syndromes and Selection Result output with the pipeline control 
signals. The outputs include accusations and convictions as a record data type, and the accusation sent to 
the Output Unit as a ROBUS Message. 


entity Node_Diagnostics_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From Input Unit 


IU_Input_Strobe_i 

in OK_Bool_Vect_Typ ; 

IU_Ready_i 

in boolean ; 

IU_Receiving_i 

in boolean ; 

IU_Synd_i 

in IU_Synd_Typ ; 

IU_Immd_Synd_i 

in IU_Imm_Synd_Typ ; 

— From Input Diagnostics Unit 

IDU_Ready_i 

in boolean ; 

IDU_Invalid_Input_i 

in OK_Bool_Vect_Typ ; 

IDU_Synd_i 

in IDU_Synd_Typ ; 

— From Route and Vote 

Unit 

RVU_Re a dy_i 

in boolean ; 

RVU_P E_o r_AC C_i 

in PE_or_ACC_Typ ; 

RVU_Source_Id_i 

in Source_Id_Typ ; 

RVU_Last_Msg_i 

in boolean ; 

RVU_Select_Result_i 

in ROBUS_Msg_Typ ; 

RVU_Synd_i 

in RVU_Synd_Typ ; 

— Output to other units 

ND U_A c c_a n d_C o n v_o 

out Acc_and_Conv_Typ ; 

NDU_Diag_Msg_o 
) ; 

out ROBUS_Msg_Typ 

end Node_Diagnostics_Unit ; 



Figure 11.19: VHDL entity declaration for the Node Diagnostics Unit 


11.9. Status Monitoring Unit 

The purpose of the SMU is to detect a local failure or a clique failure, and to detect when a clique is 
not present on the bus. This assessment is done based on the accusations and convictions from the NDU, 
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and the protocol results from the RVU. The SMU also includes a timeout check for the Synchronization 
Acquisition and Initial Synchronization protocols, as well as for the resynchronization interval after 
synchronization is achieved. 


11.9.1. Block diagram 

Figure 1 1.20 shows the block diagram for the SMU. The Controller is a 23-state Mealy machine. The 
Sync_Reset signal from the RVU is the timing reference for the timeout checks. The RVU syndromes, 
including no-majority for Acept(INIT), no-majority for Accept(ECHO), no-majority for word vote, no 
eligible voters, and invalid eligible voters, are monitored by the controller to detect protocols failures. In 
additions, checks are performed on the RVU Selection Result output. These check include a comparison 
of the message sent by a source BIU against the corresponding RVU result during the PE Broadcast 
protocol, comparing the results of processes P2 and P4 of the Collective Diagnosis protocol, and checking 
the results of the Collective Diagnosis protocol for all-convicted conditions or a conviction-against-local- 
node result. The Status Monitoring block combines the NDU diagnosis, timeout checks, and protocol 
failure indicators to detect failure conditions and the absence of a valid clique. 


MCU RVU 

Command Outputs 



Failure No_Clique 




J 

To MCU 


Figure 1 1.20: Block diagram for the Status Monitoring Unit 
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11.9.2. Interface 


Figure 11.21 shows the VHDL entity declaration for the SMU. Signal OU_Read_i from the Output 
Unit is asserted when a PE message is read from the PE Input Unit (PE_IU). 


entity Status_Monitoring_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From Route and Vote 

Unit 

RVU_S y n c_Re s e t _i 

in boolean ; 

RVU_Re a dy_i 

in boolean ; 

RVU_P E_o r_AC C_i 

in PE_or_ACC_Typ ; 

RVU_Source_Id_i 

in Source_Id_Typ ; 

RVU_Last_Msg_i 

in boolean ; 

RVU_Select_Result_i 

in ROBUS_Msg_Typ ; 

RVU_Synd_i 

in RVU_Synd_Typ ; 

— From Node Diagnostics Unit 

NDU_Acc_and_Conv_i 

in Acc_and_Conv_Typ ; 

— From PE Input Unit 


PEIU_Msg_i 

in ROBUS_Msg_Typ ; 

— From Output Unit 


OU_Read_i 

in Boolean ; 

— Output to other units 

SMU_Failure_o 

out boolean ; 

SMU_No_Clique_o 
) ; 

out boolean 

end Status_Monitoring_Unit ; 



Figure 11.21: VHDL entity declaration for the Status Monitoring Unit 


11.10. Output Unit 

The Output Unit is responsible for sending messages triggered by the local time or RVU-computed 
synchronization events. The messages are either generated internally (INIT and ECHO for 
synchronization protocols, and INITIALIZATION for Initial Diagnosis), or read from the PE Input Unit, 
the RVU, or the NDU. 
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11.10.1. Block diagram 


Figure 11.22 shows the block diagram for the OU. OK stands for “Opposite Kind” and RM for 
“ROBUS Message”. The Controller is a 26-state Mealy machine. The RVU transformation results are 
buffered until it is time to send them. OU_Read indicates when the OU is reading a PE Message from the 
PE_IU. The OU decides when to assert the Output Strobe signal based gn the protocol being executed 
without considering the Major Mode or Diagnostic Cycle. It is the responsibility of the MCU to specify 
when to enable the sending of messages. 


MCU 


RVU Command 



Figure 1 1.22: Block diagram for the Output Unit 


11.10.2. Interface 

Figure 11.23 shows the VHDL entity declaration for the OU. Signal ROBUS_Output_o is of type 
record with field equal to Message and Strobe. 
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entity Output_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From Route and Vote 

Unit 

RVU_Accept_Init_i 

in Boolean ; 

RVU_Accept_Echo_i 

in Boolean ; 

RVU_T r a n s f r m_Re s u 1 1 _i 

in ROBUS_Msg_Typ ; 

RVU_Re a dy_i 

in Boolean ; 

RVU_Last_Msg_i 

in boolean ; 

— From Node Diagnostice Unit 

NDU_Diag_Msg_i 

in ROBUS_Msg_Typ ; 

— Internal outputs 


OU_Read_o 

out Boolean ; 

— From PE Input Unit 


PEIU_Msg_i 

in ROBUS_Msg_Typ ; 

— External/ROBUS outputs 

ROBUS_Output_o 
) ; 

out ROBUS_Output_Typ 

end Output_Unit ; 



Figure 1 1.23: VHDL entity declaration for the Output Unit 


11.11. PE Input Unit 

The PE Input Unit (PE_IU) is the interface between the PE and the RPP Output Unit. The RPP is 
designed with a FIFO abstraction for the PE input interface. The PE messages are presumed to be 
available whenever the RPP decided to read a message. The PE is responsible for reporting to the RPP 
when there is a problem such that a valid message is not available. 


11.11.1. Block diagram 

Figure 1 1.24 shows the block diagram for the PE_IU. The data from the PE is carried as the Payload 
of a DATA-tagged ROBUS Message. If an error is signaled by the PE, then a PE_ERROR message is 
used. The OU_Read signal is sent directly to the PE side of the interface. 
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Figure 1 1.24: Block Diagram for the PE Input Unit 


11.11.2. Interface 

Figure 11.25 shows the VHDL entity declaration for the PE_IU. The MCU command is not used. 
PEIU_Msg_o is the PE message sent to the OU. PE_Input_i is a record with fields Message (the PE 
payload) and Synd (asserted when a valid payload is not available). PEIU_Read_Out_o is connected 
directly to OU_Read_i. 


entity PE_Input_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From Output Unit 


OU_Read_i 

in boolean ; 

— Internal outputs 


PEIU_Msg_o 

out ROBUS_Msg_Typ ; 

— External inputs from the PE 

PE_Input_i 

in PE_Input_Typ ; 

— External outputs to 

the PE 

PEIU_Read_Out_o 
) ; 

out Boolean 

end PE_Input_Unit ; 



Figure 1 1.25: VHDL entity declaration for the PE Input Unit 


11.12. PE Output Unit 

The PE Output Unit sends messages to the PE. The PE_OU messages include Collective Diagnosis 
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convictions, Schedule Update results, PE Broadcast results, and 1NIT and ECHO synchronization 
messages. In addition, the PE_OU sends the assessment result for the current PE communication 
schedule, the current major mode, and the node Id number. 


11.12.1. Block diagram 


Figure 11.26 shows the block diagram for the PE_OU. The Controller is a 13-state Mealy machine. 
The timing of the Output Strobe signal is based on the timing of the MCU Command, RVU 
Accept(INIT), RVU_Accept(ECHO), and RVU Pipeline Control signals. The Major Mode and Node Id 
read from the MCU are used as payload field contents for ROBUS Messages sent to the PE. Other 
ROBUS Messages generated within the PE_OU include VALIDJSCHEDULE, INVALIDJSCHEDULE, 
ZERO_SCHEDULE, INIT, and ECHO SPECIAL-tagged ROBUS Messages. RVU Transformation 
Results are sent to the PE without modification. 


MCU 

Command 



Figure 1 1.26: Block Diagram for the PE Output Unit 


11.12.2. Interface 

Figure 11.27 shows the VHDL entity declaration for the PE Output Unit. Signal PE_Output_o is of 
record type with fields Message and Strobe. 
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entity PE_Output_Unit is 


port ( CLK_i 

in bit ; 

— From Mode Control Unit 

Node_Id_i 

in Node_Id_Typ ; 

Node_Kind_i 

in Node_Kind_typ ; 

MCU_Node_Reset_i 

in boolean ; 

MCU_Command_i 

in MCU_Cmnd_Typ ; 

MCU_Ready_i 

in boolean ; 

— From Route and Vote 

Unit 

RVU_Accept_Init_i 

in boolean ; 

RVU_Accept_Echo_i 

in boolean ; 

RVU_T r a n s f r m_Re s u 1 1 _i 

in ROBUS_Msg_Typ ; 

RVU_Re a dy_i 

in boolean ; 

RVU_P E_o r_AC C_i 

in PE_or_ACC_Typ ; 

RVU_Source_Id_i 

in Source_Id_Typ ; 

RVU_Last_Msg_i 

in boolean ; 

— External outputs 


PE_Output_o 
) ; 

out PE_Output_Typ 

end PE_Output_Unit ; 



Figure 1 1.27: VHDL entity declaration for the PE Output Unit 
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12. Behavioral parameters 

This section presents the formulas for the behavioral parameters in the VHDL description of the RPP. 
The parameters are grouped according to the relevant RPP unit . 


12.1. Mode Control Unit (MCU) 

12.1.1. Min_Cmd_DII 

Min_Cmd_Dll denotes the minimum allowed separation between MCU commands. The PE Output 
Unit requires a minimum of 3 ticks between commands. 

Min_Cmd_DII = 3 (12.1) 

The current design of the RPP imposes the following constraint. 

Min_Cmd_Dll > 3 (12.2) 

12.1.2. ST_Xtra_Dly 

ST_Xtra_Dly denotes the extra delay required to meet the minimum-duration constraint for the Self- 
Test mode. In addition to the self-test delay, the MCU consumes 2 ticks to command a node reset and set 
the major mode variable to Self-Test. 

ST_Xtra_Dly = A ST m - Min_Cmd_DIl - 2 (12.3) 

Note that for the current version of the MCU controller can cause the actual Self-Test delay may 
exceed A STM by Min_Cmd_DII - 1 ticks. The current design of the RPP imposes the following 
constraint. 

ST_Xtra_Dly > 1 (12.4) 


12.1.3. ID_Dly 

ID_Dly denotes the delay in starting the Initial Diagnosis after entering the Clique Initialization mode 
ID_Dly = Aid begin (12.5) 

The current design of the RPP imposes the following constraint. 

ID_Dly > 1 (12.6) 
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12.1.4. LT_CD 


LT_CD denotes the local time at which to begin the execution of the Collective Diagnosis protocol. 
LT_CD = T cd (12.7) 

The current design of the RPP imposes the following constraint. 

LT_CD > 1 (12.8) 

12.1.5. LT_SU 

LT_SU denotes the local time at which to begin the execution of the Schedule Update protocol. 

LT_SU = T su (12.9) 

The current design of the RPP imposes the following constraint. 

LT_SU > 4 (12.10) 

12.1.6. LT_PE 

LT_PE denotes the local time at which to begin the execution of the PE Communication protocols. 
LT_PE = T pe (12.11) 

The current design of the RPP imposes the following constraint. 

LT_PE > 7 (12.12) 

12.1.7. LT_SP 

LT_SP denotes the local time at which to begin the execution of the Synchronization Preservation 
protocol. 

LT_SP = T SP (12.13) 

The current design of the RPP imposes the following constraint. 

LT_SP >10 (12.14) 
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12.2. Schedule Processor 


12.2.1. Max_Num_PE_Msg 

Max_Num_PE_Msg denotes the maximum number of PE messages that can be processed during the 
execution of the PE Communication protocols. 

Max_Num_PE_Msg = K PEjSChed l max (12.15) 

The current design of the RPP imposes the following constraint. 

Max_Num_PE_Msg > 0 (12.16) 

12.2.2. Dflt_Num_PE_Msg 

Dflt_Num_PE_Msg denotes the default number of messages per PE for the default schedule. The 
value of this parameter is application-dependent. The current design of the RPP imposes the following 
constraints. 

Dflt_Num_PE_Msg > 0 (12.17) 

N * Df I t_N u m_PE_Msg < K PE , sched l max (12.18) 

12.3. Input Unit (IU) 

12.3.1. PD_Rdyl_Dly 

PD_Rdyl_Dly denotes the expected duration of the first observation window during Local Diagnosis 
Acquisition (a.k.a., Preliminary Diagnosis). 

PD_Rdyl_Dly = A pd ,ow ( 12 . 19 ) 

The current design of the RPP imposes the following constraint. 

PD_Rdyl_Dly > 1 12.(20) 

12.3.2. PD_Rdy2_Dly 

PD_Rdy2_Dly denotes the expected duration of the second observation window during Preliminary 
Diagnosis. 

PD_Rdy2_Dly = A PD , 0W (12.21) 

The current design of the RPP imposes the following constraint. 
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PD_Rdy2_Dly > 1 


(12.22) 


12.3.3. ID_Wnd_Dly 

ID_Wnd_Dly denotes the delay to open the reception window in process PI of the Initial Diagnosis 
protocol. This delay is measured from the clock edge immediately following the issue of the command 
by the MCU. 

ID_Wnd_Dly = A ID P1 rcvwnd - 1 (12.23) 

The current design of the RPP imposes the following constraint. 

ID_Wnd_Dly > 0 (12.24) 


12.3.4. ID_Wnd_Sz 

IS_Wnd_Sz denotes the size of the reception window in process PI of the Initial Diagnosis protocol. 
ID_Wnd_Sz = W ID , Deskew (12.25) 

The current design of the RPP imposes the following constraint. 

ID_Wnd_Sz > 1 (12.26) 

12.3.5. IS_Wnd_Dly 

IS_Wnd_Dly denotes the delay to open the reception window for the Initial Synchronization protocol. 
This delay is measured from the clock edge at which the reception window of the Initial Diagnosis 
protocol closes to the clock edge at which the reception window of the Initial Synchronization protocol 
opens. 

IS_Wnd_Dly = Ci DjP i + A IDP1C . END + A IS _ begill + A is , PPRCV wnd (12.27) 

The current design of the RPP imposes the following constraint. 

IS_Wnd_Dly > 1 (12.28) 


12.3.6. SP_INIT_Wnd_Dly(Node_Kind) 

SP_INIT_Wnd_Dly denotes the delay to open the reception window for receiving INIT messages 
during the execution of the Synchronization Preservation protocol. This delay is measured from the clock 
edge immediately following the edge at which the MCU issues the command to the clock edge at which 
the reception window opens. 


152 



12.3.6.1. RMU 


The RMUs receive INIT messages in process PI. 

SP_INIT_Wnd_Dly(RMU) = B sp ,po + R PP - W Deskew , pre - 1 (12.29) 

The current design of the RPP imposes the following constraint. 

SP_INIT_W nd_Dly(RMU) > 0 (12.30) 

12.3.6.2. BIU 

The BIUs receive INIT messages in process P2. 

SP_INIT_Wnd_Dly(BlU) = Bs P , P o + Rs P ,ro-P 2 - As P . P 2 , R cvLax - 1 (12.31) 

Note that the reception variables for process P2 are computed using 7t P0 l P 2,Rcv in consideration of the 
joining BIUs that synchronize using the Synchronization Capture protocol during the previous execution 
of the Synchronization Preservation protocol (see Section 5.10.2). 

The current design of the RPP imposes the following constraint. 

SP_INIT_Wnd_Dly(BIU) > 0 (12.32) 

12.3.7. SP_INIT_Wnd_Sz(Node_Kind) 

SP_INIT_Wnd_Sz denotes the size of the reception window for receiving INIT messages during the 
execution of the Synchronization Preservation protocol. 


12.3.7.1. RMU 

For the RMUs in process PI: 

SP_INIT_Wnd_Sz(RMU) - W Deskew (12.33) 

The current design of the RPP imposes the following constraint. 

SP_INIT_Wnd_Sz(RMU) > 1 (12.34) 

12.3.7.2. BIU 

For the BIUs in process P2: 

SP_INIT_W nd_Sz(BIU) - 2A sp , p2 ,rcvUx + 1 (12.35) 

Note that the reception variables for process P2 are computed using 7t SP , P ol P 2,Rcv in consideration of the 
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joining BIUs that synchronize using the Synchronization Capture protocol during the previous execution 
of the Synchronization Preservation protocol (see Section 5.10.2). 

The current design of the RPP imposes the following constraint. 

SP_INIT_Wnd_Sz(BIU) > 1 (12.36) 

12.3.8. SP_ECHO_W nd_Dly (Node_Kind ) 

SP_ECHO_Wnd_Dly denotes the delay to open the reception window for receiving ECHO messages 
during the execution of the Synchronization Preservation protocol. This delay is measured from the clock 
edge immediately following the edge at which the Accept(INIT) output is asserted to the clock edge at 
which the reception window opens. 


12.3.8.1. RMU 

The RMUs receive ECHO messages in process P3. 

SP_ECHO_W nd_Dly(RMU) = R S p.pi.p 3 - A S p,p 3 , R cvUx - 1 (12.37) 

The current design of the RPP imposes the following constraint. 

SP_ECHO_Wnd_Dly(RMU) >0 (12.38) 

12.3.8.2. BIU 

The BIUs receive ECHO messages in process P4. 

SP_ECHO_Wnd_Dly(BlU) = Rsp,p 2 -p 4 - As P P 4 R cvlmax - 1 
The current design of the RPP imposes the following constraint 
SP_ECHO_W nd_Dly(BIU) > 0 

12.3.9. SP_ECHO_Wnd_Sz(Node_Kind) 

SP_ECHO_Wnd_Sz denotes the size of the reception window for receiving ECHO messages during 
the execution of the Synchronization Preservation protocol. 


(12.39) 


(12.40) 


12.3.9.1. RMU 

For the RMUs in process P3: 

SP_ECHO_W nd_Sz(RMU) = 2A SP , P: ,. RCV U X + 1 (12.41) 
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The current design of the RPP imposes the following constraint. 

SP_ECHO_Wnd_Sz(RMU) > 1 (12.42) 

12.3.9.2. BIU 

For the BIUs in process P4: 

SP_ECHO_W nd_Sz(BIU) - 2A SP . F4 . RCV U X + 1 (12.43) 

The current design of the RPP imposes the following constraint. 

SP_ECHO_Wnd_Sz(BIU) > 1 (12.44) 

12.3.10. INIT_Pls_Dly(Node_Kind) 

INIT_Pls_Dly denotes the delay from the clock edge at which an 1NIT message is received to the 
clock edge at which the corresponding event is presented at the output of the Input Unit. These INIT 
signals are generated by the Input Unit’s Sync Delay subunits, whose outputs are directly connected to the 
outputs of the Input Unit. For proper operation, the INIT outputs should be asserted at least one tick after 
the READY output is asserted during Synchronization Preservation. The READY output is asserted 
SP_INIT_Wnd_Sz ticks after the opening of the reception window. 


12.3.10.1. RMU 

For the RMUs in process PI: 

INIT_Pls_Dly(RMU) = SP_INIT_Wnd_Sz(RMU) + 1 (12.45) 

The current design of the RPP imposes the following constraint. 

INIT_Pls_Dly(RMU) > 1 (12.46) 

12.3.10.2. BIU 

For the BIUs in process P2: 

INIT_Pls_Dly (B IU) = SP_INIT_Wnd_Sz(BIU) + 1 (12.47) 

The current design of the RPP imposes the following constraint. 

INIT_Pls_Dly(BIU) > 1 (12.48) 
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12.3.11. INIT_Skw(Node_Kind) 


INIT_Skw denotes the maximum valid observed skew between 1NIT messages from trustworthy 
nodes during the Synchronization Preservation protocol. 


12.3.11.1. RMU 

For RMUs in process PI: 

lNIT_Skw(RMU) = n SPj p ljRCV (12.49) 

The current design of the RPP imposes the following constraint. 

INIT_Skw(RMU) > 1 (12.50) 

12.3.11.2. BIU 

For BIUs in process P2: 

INIT_Skw(BIU) = n S p,P 2 .Rcv (12.51) 

The current design of the RPP imposes the following constraint. 

INIT_Skw(BIU) > 1 (12.52) 

12.3.12. ECHO_Pls_Dly(Node_Kind) 

ECHO_Pls_Dly denotes the delay from the clock edge at which an ECHO message is received to the 
clock edge at which the corresponding event is presented at the output of the Input Unit. These ECHO 
signals are generated by the Input Unit’s Sync Delay subunits, whose outputs are directly connected to the 
outputs out the Input Unit. For proper operation, the ECHO outputs should be asserted at least one tick 
after the READY output is asserted during Synchronization Preservation. The READY output is asserted 
SP_ECHO_Wnd_Sz ticks after the opening of the reception window. 


12.3.12.1. RMU 

For the RMUs in process PI: 

ECHO_Pls_Dly(RMU) = SP_ECHO_Wnd_Sz(RMU) + 1 (12.53) 

The current design of the RPP imposes the following constraint. 

ECHO_Pls_Dly(RMU) > 1 (12.54) 
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12.3.12.2. BIU 


For the BIUs in process P2: 

ECHO_Pls_Dly(BIU) = SP_ECHO_Wnd_Sz(BIU) + 1 (12.55) 

The current design of the RPP imposes the following constraint. 

ECHO_Pls_Dly(BIU) > 1 (12.56) 

12.3.13. ECHO_Skw(Node_Kind) 

ECHO_Skw denotes the maximum valid observed skew between ECHO messages from trustworthy 
nodes during the Synchronization Preservation, Initial Synchronization, and Synchronization Capture 
protocols. 


12.3.13.1. RMU 

For RMUs in process P3 or P3C: 

ECHO_Skw(RMU) = IRp^rcv (12.57) 

The current design of the RPP imposes the following constraint. 

ECHO_Skw(RMU) >1 (12.58) 

12.3.13.2. BIU 

For BIUs in process P4 or P4C: 

ECHO_Skw(BIU) = n SP ,p 4 , RC v (12.59) 

The current design of the RPP imposes the following constraint. 

ECHO_Skw(BIU) > 1 (12.60) 

12.3.14. CD_Wndl_Dly 

CD_Wndl_Dly denotes the delay to open the reception window for receiving messages in process PI 
during the execution of Collective Diagnosis or Collective Diagnosis Acquisition. This delay is measured 
from the clock edge immediately following the edge at which the MCU issues the command to the clock 
edge at which the reception window opens. 

CD_Wndl_Dly = Acd.pi.rcvwnd ■ 1 (12.61) 

The current design of the RPP imposes the following constraint. 
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CD_Wndl_Dly > 0 


(12. 62) 


12.3.15. CD_Wnd2_Dly 

CD_Wnd2_Dly denotes the delay to open the reception window for receiving messages in processes 
P2, P3, and P4 during the execution of Collective Diagnosis or Collective Diagnosis Acquisition. This 
delay is measured from the clock edge at which the previous receive window is closed until the edge at 
which the next window opens. 


CD_Wnd2_Dly — Acd,p2,rcvwnd 


(12.63) 


The current design of the RPP imposes the following constraint. 

CD_Wnd2_Dly > 1 (12.64) 


12.3.16. SU_Pl_Wndl_Dly(Node_Kind) 

SU_Pl_Wndl_Dly denotes the delay to open the reception window for receiving the first set of 
messages during the execution of the Schedule Update protocol. This delay is measured from the clock 
edge immediately following the edge at which the MCU issues the command to the clock edge at which 
the reception window opens. 


12.3.16.1. RMU 

For RMUs in process PI : 

SU_P 1 _W nd 1 _Dly(RMU) = A SUiP1iRCV wnd - 1 (12.65) 

The current design of the RPP imposes the following constraint. 

SU_P 1 _W nd 1 _Dly (RMU) > 0 (12.66) 

12.3.16.2. BIU 

For BIUs in process P2: 

SU_Pl_Wndl_Dly(BIU) = A SU , P2 , RCVWND - 1 (12.67) 

The current design of the RPP imposes the following constraint. 

SU_Pl_Wndl_Dly(BIU) > 0 (12.68) 

12.3.17. SU_P2_Wndl_Dly(Node_Kind) 

SU_P2_Wndl_Dly denotes the delay to open the reception window for receiving the second message 
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stream in the execution of the Schedule Update protocol. This delay is measured from the clock edge at 
which the receive window is closed for the last message of the first stream to the clock edge at which the 
receive window is opened for the first message of the second stream. 


12.3.17.1. RMU 


For RMUs in process P3: 

SU_P2_Wndl_Dly(RMU) = (T su ,p 3 ,rcv,e,o - W DeskeWjPre ) - (T su ,pi,rcv,e,n-i + W DeS kew,post) (12.69) 

The current design of the RPP imposes the following constraint. 

SU_P2_W nd 1 _Dly (RMU) > 1 (12.70) 

12.3.17.2. BIU 


For BIUs in process P4: 

SU_P2_Wndl_Dly(BlU) = (T su ,p4,rcv,e,o - W DeskeWjPre ) - (T SUiP 2 i rcv,e,n-i + W DeskeWjP0st ) (12.71) 

The current design of the RPP imposes the following constraint. 


SU_P2_W nd 1 _Dly (B IU) > 1 


(12.72) 


12.3.18. SU_Wnd2_Dly 


SU_Wnd2_Dly denotes the time gap between consecutive reception windows of a message stream, 
and it applies to the first and second message streams during the execution of the Schedule Update 
protocol. This delay is measured from the clock edge at which the reception window of a message ends 
to the edge at which the next reception window of the stream begins. This delay applies equally to BIUs 
and RMUs in processes PI, P2, P3, and P4. If the reception windows overlap, this variable is set to 0. 


SU_Wnd2_Dly = 



- W 


Deskew? 


for Asu - W Deskew > 0 


0 , 


for A su - W Deskew < 0 


(12.73) 


The current design of the RPP imposes the following constraint. 


SU_Wnd2_Dly > 0 


(12.74) 


12.3.19. SU_DII 

SU_DII denotes the data introduction interval for the streams of the Schedule Update protocol. 
SU_DII = A su (12.75) 
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The current design of the RPP imposes the following constraint. 

SU_DII > 1 (12.76) 

12.3.20. SU_Max_Buff_Cnt 

SU_Max_Buff_Cnt denotes the maximum input buffer load for normal operation when the reception 
windows overlap during the execution of the Schedule Update protocol. To specify the value of this 
variable we leverage the timing analysis for point-to-point communication of message streams. The 
current version of the RPP uses synchronous FIFOs as input buffers. Based on the analysis in Section 4 
of this document: 

SU_MaX_Buff_Cnt = K - L(K-1)/(1 + Po)~ - (A PRRCV labs-max + ApROC, begin)/ A strcai n + lJ (12.77) 

For: 

K = N, 

ApROC. begin — A PP>R cyl a bs-max "t" 1 - and 
A s treani Ag[j, 

then: 

SU_Max_Buff_Cnt = N - L(N-1)/(1 + p 0 ) 2 - W Deskew /A su + lj 
The current design of the RPP imposes the following constraint. 

SU_Max_Buff_Cnt > 1 


(12.78) 


(12.79) 


12.3.21. PE_Wndl_Dly(Node_Kind) 

PE_Wndl_Dly denotes the delay to open the reception window for receiving the first message during 
the execution of the PE Communication protocols. This delay is measured from the clock edge 
immediately following the edge at which the MCU issues the command to the clock edge at which the 
reception window opens. 


12.3.21.1. RMU 

For RMUs in process PI : 

PE_Wndl_Dly(RMU) = A PEjP i jR cvwND,sched - 1 (12.80) 

The current design of the RPP imposes the following constraint. 

PE_Wndl_Dly(RMU) > 0 (12.81) 
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12.3.21.2. BIU 


For BIUs in process P2: 

PE_Wndl_Dly(BIU) = A P E,P 2 , R cvwND,sched - 1 (12.82) 

The current design of the RPP imposes the following constraint. 

PE_Wndl_Dly(BIU) > 0 (12.83) 


12.3.22. PE_Wnd2_Dly 


PE_Wnd2_Dly denotes the time gap between consecutive reception windows of the message stream 
during the execution of the PE Communication protocols. This delay is measured from the clock edge at 
which the previous receive window is closed to the edge at which the next window opens. This delay 
applies equally to BIUs and RMUs. If the windows overlap, this variable should be set to 0. 


PE_Wnd2_Dly = 



- W 


Deskew? 


for ApE,sched " Deskew ^ 0 


0? for Ap E sc hed " Woeskew — 0 


(12.84) 


The current design of the RPP imposes the following constraint. 


PE_Wnd2_Dly > 0 


(12.85) 


12.3.23. PE_DII 

PE_DII denotes the data introduction interval for the PE Communication protocol. 

PE_DII = Ap E ,sched (12.86) 

The current design of the RPP imposes the following constraint. 

PE_DII > 1 (12.87) 

12.3.24. PE_Max_Buff_Cnt 

PE_Max_Buff_Cnt denotes maximum input buffer load for normal operation when the reception 
windows overlap during the execution of the PE Communication protocols. To specify the value of this 
variable we leverage the timing analysis for point-to-point communication of message streams. The 
current version of the RPP uses synchronous FIFOs as input buffers. 

PE_MaX_Buff_Cnt = K - L(K-1)/(1 + po ) 2 - (A PP , R cvlabs-max + Ap R 0C,begin)/A s tream + lJ (12.88) 

For: 


K = K 


PE,schecpmax 


+ 
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A P ROC,begin — ^PP,RCvlabs-max 1 ? cllld 
-^stream -^PE.sched? 

then: 

P E_MaX_B uff_C nt — KpH.schcdLiax - LfKpK.schedlnax " 1 )/( 1 “t" Po) - Deskew'/^ PE.sched lJ (12.89) 

Since the schedule can be dynamically updated, the actual number of messages in the stream (denoted 
generically by K) can vary from cycle to cycle. Here, we use the largest allowed stream during PE 
Communication as the reference to compute PE_Max_Buff_Cnt. 

The value K PE sched l max + 1 includes the accusations message sent after the PE messages. (Note that this 
assumes that the accusations message will be sent with the same DII as the rest of the stream.) 

The current design of the RPP imposes the following constraint. 

PE_Max_Buff_Cnt > 1 (12.90) 


12.3.25. Syncns_Wnd_Sz 

Syncns_Wnd_Sz denotes the synchronous reception window size for the time-driven protocols. 
Syncns_Wnd_Sz = W Deskew (12.91) 

The current design of the RPP imposes the following constraint. 

Syncns_Wnd_Sz > 1 (12.92) 


12.3.26. Frm_Sync_Gap(Node_Kind) 

Frm_Sync_Gap denotes the time between valid ECHO messages that a node must search for in order 
to achieve frame synchronization during the execution of the Synchronization Acquisition protocol. This 
time is equal to the bound on the observed relative skew of received ECHO messages from trustworthy 
nodes. 


12.3.26.1. RMU 

For RMUs in process P3C: 

Frm_Sync_Gap(RMU) = n SPiP3C , RC v (12.93) 

The current design of the RPP imposes the following constraint. 

Frm_Sync_Gap(RMU) > 1 (12.94) 
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12.3.26.2. BIU 


For BIUs in process P4C: 

Frm_Sync_Gap(BIU) = n S pp 4CiRC v (12.95) 

The current design of the RPP imposes the following constraint. 

Frm_Sync_Gap(BIU) > 1 (12.96) 

12.4. Input Diagnostics Unit (IDU) 

12.4.1. PD_ECHO_Cntl_Lo 

PD_ECHO_Cntl_Lo denotes the minimum number of ECHO messages that is expected from a node 
of the opposite kind during the first observation phase of Preliminary Diagnosis (a.k.a., Local Diagnosis 
Acquisition). 

PD_ECHO_Cnt l_Lo = L(A PD , 0W - 1)/[(1 + po)PmajJ (12.97) 

The current design of the RPP imposes the following constraint. 

PD_ECHO_Cntl_Lo > 0 (12.98) 

12.4.2. PD_ECHO_Cntl_Hi 

PD_ECHO_Cntl_Hi denotes the maximum number of ECHO messages that is expected from a node 
of the opposite kind during the first observation phase of Preliminary Diagnosis. 

PD_ECHO_Cnt l_Hi - L(1 + p 0 )(A PD ,ow - l)/p„J +1 (12.99) 

The current design of the RPP imposes the following constraint. 

PD_ECHO_Cnt l_Hi > 0 (12. 100) 

12.4.3. PD_ECHO_Cnt2_Lo 

PD_ECHO_Cnt2_Lo denotes the minimum total number of ECHO messages that is expected from a 
node of the opposite kind by the end of the second observation phase during Preliminary Diagnosis. 

PD_ECHO_Cnt2_Lo = L(2A PD , 0W - 1)/[(1 + po)Pmax] J (12. 101) 

The current design of the RPP imposes the following constraint. 

PD_ECHO_Cnt2_Lo >0 ( 1 2 . 1 02) 
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12.4.4. PD_ECHO_Cnt2_Hi 

PD_ECH0_Cnt2_Hi denotes the minimum total number of ECHO messages that is expected from a 
node of the opposite kind by the end of the second observation phase during Preliminary Diagnosis. 

PD_ECHO_Cnt2_Hi = L(1 + p 0 )(2A PD ,ow - l)/p m J +1 (12. 103) 

The current design of the RPP imposes the following constraint. 

PD_ECHO_Cnt2_Hi >0 (12.1 04) 

12.5. Route-and-Vote Unit (RVU) 

12.5.1. INIT_Skw(Node_Kind) 

lNIT_Skw denotes the maximum valid observed skew between INIT messages from trustworthy 
nodes during the Synchronization Preservation protocol. This parameter is also used in the Input Unit. 

12.5.1.1. RMU 

For RMUs in process PI: 

IN IT_Skw(RMU) — I fsp.pi .rcv 

The current design of the RPP imposes the following constraint 
INIT_Skw(RMU) > 1 

12.5.1.2. BIU 

For BIUs in process P2: 

INIT_Skw(BIU) — n S p,p 2 .Rcv 

The current design of the RPP imposes the following constraint 
INIT_Skw(BIU) > 1 

12.5.2. ECHO_Skw(Node_Kind) 

ECHO_Skw denotes the maximum valid observed skew between ECHO messages from trustworthy 
nodes during the Synchronization Preservation, Initial Synchronization, and Synchronization Capture 
protocols. This parameter is also used in the Input Unit. 


(12.107) 


(12.108) 


(12.105) 


(12.106) 
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12.5.2.1. RMU 


For RMUs in process P3 or P3C: 

ECHO_Skw(RMU) = n sl ,p 3 . R cv (12.1 09) 

The current design of the RPP imposes the following constraint. 

ECHO_Skw(RMU) >1 (12.110) 

12.5.2.2. BIU 

For BIUs in process P4 or P4C: 

ECHO_Skw(BIU ) = FIsp,p4.rcv 

The current design of the RPP imposes the following constraint 
ECHO_Skw(BIU) > 1 

12.5.3. SCIS_Sync_Rst_Dly(Node_Kind) 

SCIS_Sync_Rst_Dly denotes the sync reset delay for the Synchronization Capture and Initial 
Synchronization protocols. This delay is measured from the time an Accept(ECHO) output is asserted 
until the Sync_Reset signal is asserted by the Route-and-Vote Unit. The Local Time is set to 0 at the next 
clock edge. 


(12.111) 


( 12 . 112 ) 


12.5.3.1. RMU 

For RMUs in process P3 or P3C: 

SCIS_S ync_Rst_Dly (RMU) = H P3 - 1 (12. 1 13) 

The current design of the RPP imposes the following constraint. 

SCIS_Sync_Rst_Dly(RMU) >1 (12.1 14) 


12.5.3.2. BIU 

For BIUs in process P4 or P4C: 

SCIS_Sync_Rst_Dly(BIU) = H P4 - 1 (12. 1 15) 


The current design of the RPP imposes the following constraint. 
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SCIS_Sync_Rst_Dly(BIU) > 1 


(12.116) 


12.5.4. SP_Sync_Rst_Dly(Node_Kind) 

SCIS_Sync_Rst_Dly denotes the sync reset delay for the Synchronization Preservation protocol. This 
delay is measured from the time an Accept output is asserted until the Sync_Reset signal is asserted by 
the Route-and-Vote Unit. The Local Time is set to 0 at the next clock edge. 


12.5.4.1. RMU 

For Synchronization Preservation, the RMUs synchronize their Local Time with respect to the output 
of Accept(ECHO) in process P3 or P3C. 

SP_Sync_Rst_Dly(RMU) = H P3 - 1 (12. 1 17) 

The current design of the RPP imposes the following constraint. 

SP_Sync_Rst_Dly(RMU) > 1 (12. 118) 


12.5.4.2. BIU 

For Synchronization Preservation, the RMUs synchronize their Local Time with respect to the output 
of Accept(INIT) in process P2. 

SP_Sync_Rst_Dly(BIU) = H P2 - 1 (12.119) 

The current design of the RPP imposes the following constraint. 

SP_Sync_Rst_Dly(BIU) > 1 (12.120) 


12.6. Status Monitoring Unit (SMU) 

12.6.1. SA_Timeout 

SA_Timeout denotes the timeout delay for the Synchronization Acquisition sequence, which includes 
the Frame Synchronization and Synchronization Capture protocols. 

SA_Timeout = A S Almax (12.121) 

The current design of the RPP imposes the following constraint. 

SA_Timeout > 1 (12.122) 
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12.6.2. IS_Timeout 


IS_Timeoi.it denotes the timeout delay for the Initial Synchronization protocol. For the current version 
of the ROBUS protocol processor, this is measured from the clock edge at which the Computation 
Process outputs the result for Initial Diagnosis to the synchronization reset during Initial Synchronization. 

IS_Timeout = A IDjP1jC -end + A ISbegin + A is l max (12. 123) 

The current design of the RPP imposes the following constraint. 

IS_Timeout > 1 (12.124) 

12.6.3. SP_Timeout 

SP_Timeout denotes the timeout delay for the resynchronization interval. 

SP_Timeout = P (12.125) 

The current design of the RPP imposes the following constraint. 

SP_Timeout > 1 (12.126) 

12.7. Output Unit (OU) 

To determine the parameter values for the Output Unit, we must keep under consideration that there is 
a one -tick delay from the strobe generated by the OU controller to the strobe sent to the Communication 
Module. 

12.7.1. ID_Snd_Dly 

ID_Snd_Dly denotes the send delay for Initial Diagnosis. This delay is measured from the clock edge 
at which the MCU issues the command to one tick before the message is sent. 

ID_Snd_Dly = Si D , P0 - 1 (12. 127) 

The current design of the RPP imposes the following constraint. 

ID_Snd_Dly > 0 (12.128) 

12.7.2. IS_INIT_Snd_Dly(Node_Kind) 

IS_INIT_Snd_Dly denotes the send delay for the INIT message during Initial Synchronization. 
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12.7.2.1. RMU 


For RMUs in process PI, this delay is measured from the time the output of Accept(INIT) is asserted 
to one tick before the OU controller asserts its strobe. 

IS_INIT_Snd_Dly(RMU) = B P1 - 1 

The current design of the RPP imposes the following constraint 
IS_INIT_Snd_Dly(RMU) > 0 

12.7.2.2. BIU 

For BIUs in process PO, this delay is measured from the clock edge at which the controller generates 
the strobe for the Initial Diagnosis message to the clock edge at which the controller generates the strobe 
for the INIT message. 

IS_INIT_Snd_Dly(BIU) - (T ISiP0 ,snd - 1) - (T id , p0 ,snd - 1) (12.131) 

— RPP + WiD.Deskew.post + Qd.P1 + Aid,P1,C-END + Ais.begin + BlS.PO (12. 132) 

The current design of the RPP imposes the following constraint. 

IS_INIT_Snd_Dly(BIU) > 1 (12.133) 

12.7.3. SP_INIT_Snd_Dly(Node_Kind) 

SP_INIT_Snd_Dly denotes the send delay for the INIT messages during the execution of the 
Synchronization Preservation protocol. 


(12.129) 


(12.130) 


12.7.3.1. RMU 

For RMUs in process PI, this delay is measured from the time the Accept(INIT) is asserted to one tick 
before the message is sent. 

SP_INIT_Snd_Dly(RMU) = B P1 - 1 (12. 134) 

The current design of the RPP imposes the following constraint. 

SP_INIT_S nd_Dly(RMU) > 0 (12. 135) 

12.7.3.2. BIU 

For BIUs in process PO, this delay is measured from the time at which the MCU issues the command 
to one tick before the time at which the INIT message is sent. 
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SP_INIT_Snd_Dly(BIU) = B SP ,p 0 - 1 


(12.136) 


The current design of the RPP imposes the following constraint. 

SP_INIT_Snd_Dly(BIU) > 0 (12. 137) 

12.7.4. ECHO_Snd_Dly(Node_Kind) 

ECHO_Snd_Dly denotes the send delay for the ECHO messages during the execution of the 
synchronization protocols. 


12.7.4.1. RMU 

For RMUs in process P3, this delay is measured from the time the output of Accept(ECHO) is asserted 
to one tick before the message is sent. 

ECHO_Snd_Dly(RMU) = B P3 - 1 (12. 138) 

The current design of the RPP imposes the following constraint. 

ECHO_Snd_Dly(RMU) >0 (12.139) 

12.7.4.2. BIU 

For BIUs in process P2, this delay is measured from the time the output of Accept(INIT) is asserted to 
one tick before the time at which the ECHO message is sent. 

ECHO_S nd_Dly(B IU) = B P2 - 1 (12. 140) 

The current design of the RPP imposes the following constraint. 

ECHO_Snd_Dly(BIU) >0 (12. 141) 

12.7.5. CD_Sndl_Dly 

CD_Sndl_Dly denotes the send delay for the first message during the execution of Collective 
Diagnosis and Collective Diagnosis Acquisition. This delay is measured from the clock edge at which the 
MCU issues the command to one tick before the clock edge at which the message is sent. 


CD_Sndl_Dly — Scd,po ■ 1 


(12.142) 


The current design of the RPP imposes the following constraint. 

CD_Sndl_Dly> 1 (12.143) 
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12.7.6. CD_Snd2_DIy 


CD_Snd2_Dly denotes the send delay for the second, third, and fourth messages during the execution 
of Collective Diagnosis and Collective Diagnosis Acquisition. This delay is measured from the time the 
controller generates the strobe for a message to one tick before the next message is to be sent. 

CD_Snd2_Dly = (T C d,pi,snd - 1) - (Tcd,po,snd - 1) 

— Rpp + W DeskeWiP0st + Ccd.pi + Scd,p2 (12.144) 

The current design of the RPP imposes the following constraint. 

CD_Snd2_Dly > 1 (12.145) 


12.7.7. SU_Pl_Sndl_Dly(Node_Kind) 

SU_Pl_Sndl_Dly denotes the send delay for the first message of the first pass in the Schedule Update 
protocol. This delay is measured from the clock edge at which the MCU command is issued to one tick 
before the clock edge at which the message is sent. 


12.7.7.1. RMU 

For RMUs in process PI: 


SU_Pl_Sndl_Dly(RMU) = T SU , P1 , SND , 0 - T su - 1 (12. 146) 

The current design of the RPP imposes the following constraint. 

SU_P 1 _S nd 1 _Dly(RMU) > 1 (12.147) 

12.7.7.2. BIU 

For BIUs in process PO: 

SU_Pl_Sndl_Dly(BIU) = S su , P0 - 1 (12. 148) 

The current design of the RPP imposes the following constraint. 

SU_Pl_Sndl_Dly(BIU) >0 (12. 149) 


12.7.8. SU_P2_Sndl_Dly(Node_Kind) 

SU_P2_Sndl_Dly denotes the send delay for the first message of the second stream in the Schedule 
Update protocol. This delay is measured from the clock edge at which the controller generates the strobe 
for the last message of the first stream to the clock edge at which the controller generates the strobe for 
the first message of the second stream. 
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For the current design of the ROBUS protocol processor, the Output Unit includes a buffer to store the 
messages of the second stream. At a minimum, this buffer introduces a one-tick delay to the processing 
of each message. This extra delay can be assigned to either the computation delays for processes P2 and 
P3 (i.e., Csu,P2 and Csuau) or the minimum send delay for processes P2 and P3 (i.e., SsuiUmin and 

SsU,P3lmin)- 


12.7.8.1. RMU 

For RMUs in process P3: 

SU_P2_Sndl_Dly(RMU) = Asu,stream,po-p2Irmu (12.150) 

The current design of the RPP imposes the following constraint. 

SU_P2_Sndl_Dly(RMU) > 1 (12.151) 

12.7.8.2. BIU 

For BIUs in process P2: 

SU_P2_Snd 1 _Dly (BIU) - A su , ST ream,po-p2Ibiu (12. 152) 

The current design of the RPP imposes the following constraint. 

SU_P2_Sndl_Dly(BIU) > 1 (12.153) 

12.7.9. SU_DII 

SU_D11 denotes the data introduction interval for the Schedule Update protocol. This is the same 
parameter used in the Input Unit. This parameters is also used in the Input Unit. 

SU_DII = A su (12.154) 

The current design of the RPP imposes the following constraint. 

SU_DII > 1 (12.155) 

12.7.10. PE_Sndl_Dly(Node_Kind) 

PE_Sndl_Dly denotes the send delay for the first message in PE Communication. This delay is 
measured from the clock edge at which the command is issued to one tick before the clock edge at which 
the message is sent. 
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12.7.10.1. RMU 


For RMUs in process PI: 

PE_Sndl_Dly(RMU) = Tsu,pi,sND,sched,o ■ T fh - 1 

The current design of the RPP imposes the following constraint. 

PE_Snd 1 _Dly (RMU) > 1 

12.7.10.2. BIU 

For BIUs in process PO: 


PE_Sndl_Dly(BlU) — SpE,po,sched - 1 


The current design of the RPP imposes the following constraint. 
PE_Sndl_Dly(BIU) > 1 

12.7.11. PE_DII 

PE_DII denotes the data introduction interval for PE Communication, 
the Input Unit. 

PE_DII = Ap Esc hed 

The current design of the RPP imposes the following constraint. 

PE DII > 1 


(12.156) 

(12.157) 

(12.158) 

(12.159) 

This parameter is also used in 

(12.160) 

(12.161) 
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13. Structural parameters 


This section presents the formulas for the structural parameters in the VHDL description of the RPP. 
The parameters are grouped according to the corresponding RPP unit. Interface-level structural 
parameters are also defined. 


13.1. Interface-level structural parameters 

13.1.1. N 

N denotes the number of BIUs in the system. The RPP can operate in a system with one or more 
BIUs. 

N > 1 (13.1) 

13.1.2. M 

M denotes the number of RMUs in the system. The RPP can operate in a system with one or more 
RMUs. 

M > 1 (13.2) 

13.1.3. Num_Lnk_Synd 

Num_Lnk_Synd (= L ls ) denotes the number of syndrome inputs provided by the Communication 
Module for each receiver port. The current description of the RPP assumes that there is at least one link 
syndrome for each input port. 

N um_Lnk_S ynd > 1 (13.3) 

For an implementation that does not have link syndromes, Num_Lnk_Synd should be set to one and 
then the input-syndrome signals should be set to the de-asserted value. 


13.1.4. Payload_Width 

Payload_Width (= L PF ) denotes the width of the Payload field for the ROBUS messages. This width 
must be large enough to satisfy for the following constraints: fit the Payload width requirement for PE 
messages; fit the width required to send Collective Diagnosis messages; fit the width required for the 
BIUs to send their Node_Id to the PEs; fit Max_Num_PE_Msg expressed in binary code; and fit a 
different binary value for each SPECIAL ROBUS message. 

Let Max_PE_Payload_Width denote the Payload width requirement for PE messages. The width 
requirement for Collective Diagnosis messages is max(N, M). The width required for BIU node Ids is 
f log 2 (N)~|. (Note that N > f log 2 (N)].) The minimum width requirement to fit the maximum number of PE 
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messages is flog 2 (Max_Num_PE_Msg)l. We would like Max_Payload to be greater then or equal to 
Max_Num_PE_Msg in order to be able to allocate all the available bandwidth to a single PE. 

Let Num_Special_Msg denote the number of SPECIAL ROBUS messages defined. The minimum 
width required to assign a different binary value to each SPECIAL message is flog 2 (Num_Special_Msg)l. 
Then: 

Payload_Width > max(Max_PE_Payload_Width, N. M, 

riog 2 (Max_Num_PE_Msg + 1)1, flog 2 (Num_Special_Msg)l) (13.4) 

13.1.5. Max_Payload 

Max_Payload denotes the maximum value of the Payload field in binary code. 

Max_Payload = 2 Payload - Wldth - 1 (13.5) 

13.2. Mode Control Unit (MCU) 

13.2.1. Max_Diag_Cyc 

Max_Diag_Cyc denotes the maximum value of the Diagnostic Cycle counter. For the current version 
of the ROBUS protocols, a node in Clique Join mode goes through two Diagnostic Cycles before being 
allowed to join a clique. The Diagnostic Cycle counter counts up starting from 0. The value of this 
parameter must satisfy the following constraint. 

Max_Diag_Cyc >1 (13.6) 


13.2.2. Max_MCU_LT 

Max_MCU_LT denotes the maximum value of the Local Time counter. The Local Time counter must 
be able to count at least up to the maximum duration of a resynchronization cycle. 

Max_MCU_LT > P (13.7) 

13.2.3. Max_MCU_Tmr 

Max_MCU_Tmr denotes the maximum value of the general-purpose timer in the MCU. The 
maximum count must be greater than or equal to the maximum loaded value. 

Max_MCU_Tmr > max(Min_Cmd_DII, ST_Xtra_Dly, ID_Dly) (13.8) 
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13.3. Schedule Processor 


13.3.1. Max_PE_Msg_Cnt 

Max_PE_Msg_Cnt denotes the maximum value of the counter for the total number of scheduled PE 
messages. For this counter, we want to prevent an overflow condition, which could cause a missed 
detection of an excessive number of scheduled messages. 

Since the error detector has a latching output, we choose Max_PE_Msg_Cnt to be greater than or 
equal to Max_Payload. This ensures that the counter cannot overflow at the same time that its count 
exceeds Max_Num_PE_Msg. 

Max_PE_Msg_Cnt > 2*Max_Payload (13.9) 


13.3.2. Sched_FIFO_Depth 

Sched_FIFO_Depth denotes the depth of the schedule memory. Since a schedule consists of N 
messages, the FIFO must be at least N deep. 

Sched_FIFO_Depth > N (13. 10) 


13.4. Input Unit (IU) 

13.4.1. Input_FIFO_Depth 

Input_FIFO_Depth denotes the depth of the input FIFO at each input port. This buffer must be large 
enough to hold at least one message more than the maximum loads for the Schedule Update and PE 
Communication protocols. The extra message is required to ensure that if an overload occurs, then the 
error is recorded in the buffer at least once. 

Input_FIFO_Depth > max(SU_Max_Buff_Cnt, PE_Max_Buff_Cnt) + 1 (13.1 1) 


13.4.2. Max_Frm_Sync_Tmr 

Max_Frm_Sync_Tmr denotes the maximum value of the timer in the Frame Synchronizer. This timer 
is used to measure the time between ECHO messages. 

Max_Frm_Sync_Tmr > max(Frm_Sync_Gap(RMU), Frm_Sync_Gap(BIU)) (13.12) 


13.4.3. Max_Sync_Dly_Tmr 

Max_Sync_Dly_Tmr denotes the maximum value for the timers in the INIT and ECHO Delay units. 
The same structural constraint is used since the same component is instantiated for processing INITs and 
ECHOs. The same timer is used to implement the pulse delay and the pulse width. 
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Max_Sync_Dly_T mr > max(INIT_Pls_Dly(RMU), INIT_Pls_Dly(BIU), 

ECHO_Pls_Dly(RMU), ECHO_Pls_Dly(BIU), 

INIT_Skw(RMU) + 1, INIT_Skw(BIU) + 1, 

ECHO_Skw(RMU) + 1, ECHOJSkw(BIU) + 1) (13.13) 


13.4.4. Max_IU_Cntrlr_Tmr 

Max_IU_Cntrlr_Tmr denotes the maximum value for the general-purpose timer in the IU controller. 
This timer is used to control the timing of the reception windows and the Ready signal. 

Max_IU_Cntrlr_Tmr > max(PD_Rdyl_Dly, PD_Rdy2_Dly, ID_Wnd_Dly, 

ID_Wnd_Sz, IS_Wnd_Dly, 

SP_INIT_W nd_Dly (RMU) , SP_lNIT_Wnd_Dly(BIU), 
SP_INIT_Wnd_Sz(RMU), SP_lNIT_Wnd_Sz(BIU), 
SP_ECHO_Wnd_Dly(RMU), SP_ECHO_Wnd_Dly(BIU), 
SP_ECHO_Wnd_Sz(RMU), SP_ECHO_Wnd_Sz(BIU), 

CD_Wndl_Dly, CD_Wnd2_Dly, 

SU_P 1 _W nd 1 _Dly (RMU) , SU_Pl_Wndl_Dly(BIU), 

SU_P2_W nd 1 _Dly (RMU) , SU_P2_Wndl_Dly(BIU), 

SU_Wnd2_Dly, SU_DII, 

PE_W nd 1 _Dly(RMU) , PE_W nd 1 _Dly (BIU) , 

PE_Wnd2_Dly, PE_DII, Syncns_Wnd_Sz) (13.14) 


13.4.5. Max_IU_Cntrlr_Cntr 

Max_IU_Cntrlr_Cntr denotes the maximum value of the general-purpose counter in the IU controller. 
This counter is used to count the messages in a stream during the Schedule Update protocol. 

Max_IU_Cntrlr_Cntr > N (13.15) 
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13.5. Input Diagnostics Unit (IDU) 


13.5.1. Max_Rate_Mon_Cntr 

Max_Rate_Mon_Cntr denotes the maximum value of the Rate Monitor counter. This counter is used 
to count the number of received ECHO messages during both Preliminary Diagnosis windows. The 
maximum value of the counter must be greater than or equal to the maximum number of expected 
messages. The counter has an overflow output to detect the arrival of more ECHO messages than its 
capacity. 

Max_Rate_Mon_Cntr > PE_ECHO_Cnt2_Hi (13.16) 


13.5.2. Max_Seq_Mon_Cntr 

Max_Seq_Mon_Cntr denotes the maximum value of the Sequence Monitor counter. This counter is 
used to count the actual number of received messages out of the total number of expected messages 
during each of the Collective Diagnosis and Schedule Update protocols. 

Max_Seq_Mon_Cntr > max(4, N) (13.17) 


13.6. Route-and-Vote Unit (RYU) 


13.6.1. Max_RVU_Acpt_Tmr 

Max_RVU_Acpt_Tmr denotes the maximum value of the timer in the Accept function module. This 
module is instantiated for the Accept(INIT) and Accept(ECHO) functions. The timer is used to check the 
relative skew of the received messages with respect to the selected message. 

Max_RVU_Acpt_T mr > max(lNIT_Skw(RMU), lNIT_Skw(BIU), 

ECHO_S kw(RMU) , ECHO_Skw(BIU)) (13.18) 


13.6.2. Max_RVU_Cntrlr_Tmr 

Max_RVU_Cntrlr_Tmr denotes the maximum value of the timer used by the RVU controller. This 
timer is used to implement the synchronization-reset delays for Synchronization Capture, Initial 
Synchronization, and Synchronization Preservation. 

Max_RVU_Cntrlr_T mr > max(SCIS_Sync_Rst_Dly(RMU), SCIS_Sync_Rst_Dly(BIU), 

S P_S ync_Rst_Dly(RMU) , SP_Sync_Rst_Dly(BIU)) (13.19) 
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13.7. Status Monitoring Unit (SMU) 


13.7.1. MaxJTimeout 

Max_Timeout denotes the maximum value of the timeout timer. 

MaxJTimeout > max(SAJTimeout, IS_Timeout, SPJTimeout) (13.20) 


13.7.2. Sent_FIFO_Depth 

Sent_FIFO_Depth denotes the depth of the SMU FIFO. This FIFO is used to buffer sent PE messages 
by a source BIU until the corresponding result of the PE Broadcast protocol is computed. 


Sent_FIFO_Depth > 

L[2(R pp + Woeskew.post) + SpE.Pl.sched + (CpE.Pl.sched + CpE,P2,sched) + 1]/A P J + 1 


(13.21) 


13.8. Output Unit (OU) 


13.8.1. Output_FIFO_Depth 

Output_FIFO_Depth denotes the depth of the FIFO buffer in the Output Unit. This FIFO is used to 
buffer RVU messages during Collective Diagnosis processes PI, P2, and P3; for Schedule Update 
processes PI, P2, and P3; and during process PI of PE Broadcast and Accusation Exchange. For the 
Schedule Update protocol, this buffer must hold at most all the messages in a stream: N. For Collective 
Diagnosis, the buffer must hold at most 1 message. For PE Communication, the load is determined by the 
DII and the delay between the time when the RVU outputs the message and the time when the OU reads 
the message. 

Output_FIFO_Depth > max (N, ceil((S PE , Pitched - 1 )/A PE ,schedX ceil((S PE ,Pi,acc - 1 )/A PE . S ched)) (13.22) 


13.8.2. Max_OU_CntrIr_Tmr 

Max_OU_Cntrlr_Tmr denotes the maximum value of the timer in the OU controller. This timer is 
used to control the timing of output messages. 

Max_OU_Cntrlr_Tim- > max(ID_Snd_Dly, IS_INIT_Snd_Dly(RMU), IS_INIT_Snd_Dly(BIU), 

SP_INIT_Snd_Dly(RMU), SP_INIT_Snd_Dly(BIU), 
ECHO_Snd_Dly(RMU), ECHO_Snd_Dly(BIU), 

CD_Sndl_Dly, CD_Snd2_Dly, 

SU_P 1 _S nd 1 _Dly(RMU) , SU_Pl_Sndl_Dly(BIU), 
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(13.23) 


SU_P2_S nd 1 _Dly(RMU) , SU_P2_Sndl_Dly(BIU), SU_DII, 

PE_Snd 1 _Dly (RMU) , PE_Sndl_Dly(BIU), PE_DII) 

13.8.3. Max_OU_Cntrlr_Cntr 

Max_OU_Cntrl_Cntr denotes the maximum value of the counter in the OU controller. This counter is 
used to count the number of messages in a stream during the Schedule Update protocol. 

Max_OU_Cntrlr_Cntr > N (13.24) 
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14. Specifying a particular solution 

This section lists the variables of the generic model that must be specified in order to compute the RPP 
behavioral and structural parameters for a particular implementation. 


14.1. Platform Specifications 


Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

1 

N 

Unitless 

Integer 

> 1 

> 1 

2 

M 

Unitless 

Integer 

> 1 

> 1 

3 

fo 

Hertz 

Real 

>0 

Arbitrary 

4 

Po 

Unitless 

Real 

>0 

Arbitrary 

5 

d PP ,i 

Nominal clock ticks 

Real 

>0 

Arbitrary 

6 

d PP ,h 

Nominal clock ticks 

Real 

>0 

Arbitrary 

7 

■^Comm 

Local clock ticks 

Integer 

> 1 

Arbitrary 

8 

§POE 

Seconds 

Real 

>0 

Arbitrary 

9 

Lls 

Unitless 

Integer 

>0 

> 1 

10 

L pf I P e 

Bits 

Integer 

> 1 

> 1 


Notes: 

• Items 1 and 2: Number of BIUs and RMUs, respectively. 

• Item 3: The generic model imposes no significant restrictions on the values of f 0 . The valid value range for f 0 is 
determined by the implementation after place-and-route for the target technology. 

• Item 4: p 0 = 0 corresponds to a perfect clock oscillator with no drift with respect to real time. 

• Items 5 and 6: For some systems, d PP i and d PP h could be negligibly small, and thus the analysis should handle 
the case in which they are set to 0. 

• Item 8: 8 PO e could be negligibly small for some systems. 

• Item 9: Number of syndromes for each receiver. 

• Item 10: Required Payload field width for PE messages. 


14.2. Environmental Specifications 


Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

1 

Sfcp 

Nominal clock ticks 

Real 

>0 

Arbitrary 


Notes: 

• Item 1: 8 FCP could be negligibly small for some systems. 


14.3. Operational-Delay Constraints 


Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

1 

^iD.begin 

Local clock ticks 

Integer 

>0 

>2 

2 

SlD,P()lmin 

Local clock ticks 

Integer 

>0 

= 1 
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Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

3 

AlD.Pl.RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

4 

ClD.pi 

Local clock ticks 

Integer 

>0 

= 1 

5 

AlD,pi,C-END 

Local clock ticks 

Integer 

>0 

= 2 

6 

^IS,begin 

Local clock ticks 

Integer 

>0 

>0 

7 

BlS.Polmin 

Local clock ticks 

Integer 

>0 

= 0 

8 

AlS.Pl.RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 0 

9 

Bsp.polmin 

Local clock ticks 

Integer 

>0 

= 1 

10 

BpiUin 

Local clock ticks 

Integer 

>0 

= 3 

11 

Bp2lmin 

Local clock ticks 

Integer 

>0 

= 3 

12 

Bp3l m in 

Local clock ticks 

Integer 

>0 

= 3 

13 

Asp.Pl.RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

14 

AsP,P2,RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

15 

AsP,P3 ,RC VWND Imin 

Local clock ticks 

Integer 

>0 

= 1 

16 

AsP,P4,RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

17 

CsP.Pl 

Local clock ticks 

Integer 

>0 

>4 

18 

CsP,P 2 

Local clock ticks 

Integer 

>0 

>4 

19 

CsP,P3 

Local clock ticks 

Integer 

>0 

>4 

20 

CsP,P4 

Local clock ticks 

Integer 

>0 

>4 

21 

ScD,Polmin 

Local clock ticks 

Integer 

>0 

= 2 

22 

AcD.Pl.RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

23 

CcD.Pl 

Local clock ticks 

Integer 

>0 

= 1 

24 

^CD, PI Imin 

Local clock ticks 

Integer 

>0 

= 3 

25 

AcD,P2,RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

26 

CcD,P 2 

Local clock ticks 

Integer 

>0 

= 1 

27 

ScD,P2lmin 

Local clock ticks 

Integer 

>0 

= 3 

28 

AcD,P3,RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

29 

CcD,P3 

Local clock ticks 

Integer 

>0 

= 1 

30 

ScD,P3lmin 

Local clock ticks 

Integer 

>0 

= 3 

31 

AcD,P4.RCVWNDlmin 

Local clock ticks 

Integer 

>0 

= 1 

32 

CcD,P4 

Local clock ticks 

Integer 

>0 

= 1 

33 

AcD.P4,C-END 

Local clock ticks 

Integer 

>0 

= 3 

34 

Asu.snd-mode 

Local clock ticks 

Integer 

>0 

= 1 

35 

SsU.Polmin 

Local clock ticks 

Integer 

>0 

= 1 

36 

Asu.P 1 .RCVWND Imin 

Local clock ticks 

Integer 

>0 

= 1 

37 

Csu.Pl 

Local clock ticks 

Integer 

>0 

= 1 

38 

Ssu.PlImin 

Local clock ticks 

Integer 

>0 

= 3 

39 

Csu,P2 

Local clock ticks 

Integer 

>0 

= 2 

40 

Ssu,P2lmin 

Local clock ticks 

Integer 

>0 

= 3 

41 

Csu,P3 

Local clock ticks 

Integer 

>0 

= 2 

42 

Ssu,P3lmin 

Local clock ticks 

Integer 

>0 

= 3 

43 

Csu,P4 

Local clock ticks 

Integer 

>0 

= 1 

44 

Asu,P4,C-END 

Local clock ticks 

Integer 

>0 

= 5 

45 

ApE.SND-SA 

Local clock ticks 

Integer 

>0 

= 2 

46 

^ PE,PO,sched 1 min 

Local clock ticks 

Integer 

>0 

= 2 

47 

^PE.Pl, RCVWND, schedlmin 

Local clock ticks 

Integer 

>0 

= 1 

48 

^PE,Pl,sched 

Local clock ticks 

Integer 

>0 

= 1 

49 

S PE,P 1 , sched 1 min 

Local clock ticks 

Integer 

>0 

= 4 

50 

CpE,P2,sched 

Local clock ticks 

Integer 

>0 

= 1 
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Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

51 

ApE,PO,acc_ready 

Local clock ticks 

Integer 

> 1 

= 1 

52 

SpE,P0,acdmin 

Local clock ticks 

Integer 

>0 

= 1 

53 

CpE,Pl,acc 

Local clock ticks 

Integer 

>0 

= 1 

54 

^PE,P1 ,acc_ready 

Local clock ticks 

Integer 

> 1 

= 1 

55 

SpE,Pl,acJmin 

Local clock ticks 

Integer 

>0 

= 4 

56 

CpE,P2,acc 

Local clock ticks 

Integer 

>0 

= 1 

57 

ApE,P2,acc,C-END 

Local clock ticks 

Integer 

>0 

= 3 

58 

-A-Comp 

Local clock ticks 

Integer 

> 1 

= 1 

59 

^PD,begin 

Local clock ticks 

Integer 

>0 

= 1 

60 

^FS,begin 

Local clock ticks 

Integer 

>0 

= 0 

61 

AcDM-CIM 

Local clock ticks 

Integer 

>0 

= 0 

62 

Hp 4 

Local clock ticks 

Integer 

>0 

>4 


Notes: 

• Items 1: The RPP sub-units are ready to receive a command 2 ticks after the MCU asserts the Node_Reset 
signal. 

• Item 2: SiD.polmin is determined by the minimum delay for the OU to assert the output strobe signal after the 
MCU issues the command. 

• Item 3: A ID P1RCV wNDlmm is determined by the minimum delay for the IU to open an reception window after the 
MCU issues the command. 

• Item 4: Cm.pi is taken as the delay from the closing of the IU reception window until the IDU outputs the 
corresponding results. 

• Item 5: A ID P1 C . END is taken as the delay from the IDU output until the MCU reads the failure signal from the 
SMU, which includes an assessment of the NDU output. 

• Item 6: There is no need to delay the start of Initial Synchronization after Initial Diagnosis is complete. 

• Item 7: The OUs at the BIUs are ready to send Initial Synchronization messages right after sending the Initial 
Diagnosis message. Therefore, in effect, the OU is ready to send immediately at the start of Initial 
Synchronization. 

• Item 8: The IU can open the Initial Synchronization window just one tick after closing the window for Initial 
Diagnosis. Therefore, in effect, the IU is ready to open a window immediately at the start of Initial 
Synchronization. 

• Item 9: B sp , P olmin is determined by the minimum delay for the OU to set the output strobe signal after the MCU 
issues the command. 

• Items 10, 11, and 12: B|.|l mlll , B P il min , and B P il m i„ are taken as the minimum required delays to ensure the 
diagnosis is complete before the OU sends the message: 1 tick for the NDU, plus 1 tick for the SMU and MCU, 
plus 1 tick for the OU. 

• Item 13 and 14: A SP P1 RCV wNDlmm and A SP P2 , R cvwNDlmin are determined by the minimum delay for the IU to open 
a window after the MCU issues the command. 

• Items 15 and 16: A SP P3 RCV wNDlmm and A SP P4 jRCV wNDlmin are determined by the minimum delay for the IU to 
open a window after the corresponding Accept output is asserted. 

• Items 17, 18, 19, and 20: C sp , P i, C S p,p 2 i C SP , P 3 , and C SP , P 4 are determined by the minimum delay to generate an 
Accept output after the closing of an reception window. 

• Item 21: S C D,polmm is determined by the minimum delay for the OU to set the output strobe signal after the MCU 
issues the command. 

• Item 22: A S p,pi, R cvwNDlmm is determined by the minimum delay for the IU to open a window after the MCU 
issues the command. 

• Item 23: Ccd.pi is taken as the delay from the closing of the IU reception window until the IDU outputs the 
corresponding results. 

• Item 24, 27, and 30: ScD.pilmin, ScD,P 2 lmin, and S C D,P 3 lmin are taken as the minimum required delays to ensure the 
diagnosis is complete before the OU sends the message: 1 tick for the NDU, plus 1 tick for the SMU and MCU, 
plus 1 tick for the OU. 
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• Items 25, 28, and 31: Acd^.rcvwndUi. Acd.ps.rcvwndUi. and A CD ,P 4 ,RcvwNDUn correspond to the minimum 
delays for the IU to open a window after the closing of another. 

• Items 26, 29, and 32: C C d,p 2 i Ccd,p 3 > and C C d,p 4 are taken as the delays from the closing of the IU reception 
window until the IDU outputs the corresponding results. 

• Item 33: A C d,p 4 ,c-end is taken as the delay from the IDU output until the MCU reads the failure signal from the 
SMU, which includes an assessment of the NDU output. 

• Item 34: This is the delay to send the mode message to the PE after the MCU issues the command to start the 
execution of the Schedule Update protocol. The PE_OU takes 1 tick to do this. 

• Item 35: Ssu.polmin is determined by the minimum delay for the OU to set the output strobe signal after the MCU 
issues the command. 

• Item 36: For RPP2, A su P1 , R cvwNDlmin is determined by the minimum delay for the IU to open a window after 
the MCU issues the command. 

• Item 37: C S u,pi is taken as the delay from the closing of the IU reception window until the IDU outputs the 
corresponding results. 

• Item 38: S S u,pilmin is taken as the minimum required delay to ensure the diagnosis is complete before the OU 
sends the message: 1 tick for the NDU, plus 1 tick for the SMU and MCU, plus 1 tick for the OU. 

• Items 39 and 41: C$u,p 2 and C S u,P 3 are taken as the minimum delays from the closing of the IU reception 
window until the RVU result is available at the output of the OU message buffer. 

• Items 40 and 42: Ssu,P 2 lmin and Ssu,P 3 lmin are taken as the minimum required delays to ensure the diagnosis is 
complete before the OU sends the message: 1 tick for the NDU, plus 1 tick for the SMU and MCU, plus 1 tick 
for the OU. 

• Item 43: Csu,P 4 is taken as the delay from the closing of the IU reception window until the IDU outputs the 
corresponding results. 

• Item 44: A SU P4 C . END is taken as the delay from the IDU output until it is safe to issue the next command. Since 
the command by the MCU to execute the PE Communication protocol is triggered by “LT = T PE - 1”, the T PE 
must set such that T PE - 1 coincides or occurs later than the earliest time at which the schedule assessment is 
complete. The Invalid_Schedule signal from the Schedule Processor is available 3 ticks after the IDU output is 
ready. The RVU output is delayed by 1 tick at the Schedule processor. Then, there is a 1 tick delay to update 
the Schedule_Overload signal. There is 1 tick delay to update the internal Invalid_Schedule. Finally, there is a 
1-tick delay to update the output Invalid_Schedule signal. The trigger to issue the MCU command can occur on 
the same clock edge the Invalid_Schedule is updated. A S u,p 4 ,c-end = 5 gives the earliest time at which the PE 
Communication command can be asserted. 

• Item 45: This is the delay to send the schedule assessment (SA) message to the PE after the MCU issues the 
command to start the execution of the PE Communication protocol. The PE_OU takes 2 tick to do this. 

• Item 46: S PE>P o, SC hedlmin is determined by the minimum delay for the OU to set the output strobe signal after the 
MCU issues the command. The OU allows an extra tick to enable the Schedule Processor to retrieve the 
schedule information for the first message. 

• Item 47: A PE P1 RCV wND,schedlmin is determined by the minimum delay for the IU to open a window after the MCU 
issues the command. 

• Items 48 and 50: C PE PEsche d and C PE>P2 , S ched are taken as the delay from the closing of the IU reception window 
until the IDU outputs the corresponding results. 

• Item 49: This is set equal to S PE>PEacc l mill . 

• Item 51: During PE Communication, the OU at each BIU sends an accusations message whenever it is not its 
turn to send PE messages. When the time comes to send an accusations message at the end of PE 
Communication, the message is assumed to be ready just one tick after the last PE message is read. 

• Item 52: S PE>P o, aC clmin is determined by the minimum delay for the OU to set the output strobe signal after the 
accusations message becomes available. 

• Item 53: C PE P i >aC c is taken as the delay from the closing of the IU reception window until the IDU outputs the 
corresponding results. 

• Item 54: When the time comes to send an accusations message at the end of PE Communication, the message is 
assumed to be ready just one tick after the last PE message is processed. 

• Item 55: is taken as the minimum required delays to ensure the diagnosis, including processing of 

the suspicions matrix, is complete before the OU sends the message: 2 ticks for the NDU to process the 
syndromes including the suspicions, plus 1 tick for the SMU and MCU, plus 1 tick for OU. The OU processes 
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the accusations message with the same timing as the PE messages. For the OU, the only difference in execution 
is that the message is read from the NDU instead of the RVU buffer. 

• Item 56: CpE,p 2 ,acc is taken as the delay from the closing of the IU reception window until the IDU outputs the 
corresponding results. 

• Item 57: A PE , P2 ,acc,c-END is taken as the delay from the IDU output until the MCU reads the failure signal from 
the SMU, which includes an assessment of the NDU output. The delay at the NDU includes the reduction of 
the suspicions matrix. 

• Item 58: All the operations can be performed with a data introduction interval of 1, except for the accusations 
message during PE Communication, which requires an extra tick at the NDU in order to reduce the suspicions 
matrix. This special case is the reason why the accusations message is sent only once. 

• Item 59: The Preliminary Diagnosis observation window is opened one tick after the MCU issues the 
command. 

• Item 60: Frame Synchronization begins immediately after the end of the second observation window in 
Preliminary Diagnosis. 

• Items 61: The MCU switches from Clique Detection mode to Clique Initialization mode in the clock edge 
immediately after a failure or no-clique condition is detected. 

• Item 62: The minimum value to the synchronization reset delay is determined by the time to diagnose the Input 
Eligible Voters and complete the diagnosis after the Accept(ECHO) in Initial Synchronization and 
Synchronization Capture. From the Accept(ECHO), it takes 1 tick for the IU to close the reception window, 1 
tick for the IDU to output the final input diagnosis, 1 tick for the NDU to update the accusations, and 1 tick for 
the SMU to read the accusations and update the failure signal. The synchronization-reset signal can be asserted 
concurrently with this update to the SMU failure signal. 


14.4. Preservation Mode Specifications 


Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

1 

Asu 

Local clock ticks 

Integer 

> 1 

> 1 

2 

AsU,STREAM,P0-P2lmin 

Local clock ticks 

Integer 

> 1 

> 1 

3 

ApE.sched 

Local clock ticks 

Integer 

> 1 

> 1 

4 

ApE,sched-acc 1 min 

Local clock ticks 

Integer 

> 1 

= ApE,sched 

5 

Tcd 

Local clock ticks 

Integer 

>0 

> l 

6 

Tsu 

Local clock ticks 

Integer 

>0 

>4 

7 

T pe 

Local clock ticks 

Integer 

>0 

>7 

8 

Tsp 

Local clock ticks 

Integer 

>0 

> 10 

9 

Dflt_Num_PE_Msg 

Unitless 

Integer 

>0 

> 1 

10 

AsU.SND-MODE-READ-PElmin 

Local clock ticks 

Integer 

>-l 

>-l 

11 

ApE.SND-SA-READ-PElmin 

Local clock ticks 

Integer 

>-l 

>-l 


Notes: 

• Items 1, 2, and 3: The RPP is designed to handle a data introduction interval of 1 tick. 

• Item 4: The accusations message at the end of PE Communication is sent with the same data introduction 

interval as the scheduled PE messages. 

• Item 5: The RPP is ready to begin the Collective Diagnosis protocol as soon as the local time is reset. Since the 

MCU uses a “LT = T C d - 1” comparison to trigger the prototocol and LT >= 0, T C d - 1 must be non-negative. 

• Items 6, 7, and 8: The MCU must wait at least 3 ticks between commands in order to accommodate the timing 
constraints at the PE_OU. 

• Item 9: Number of messages per PE for the default schedule. 

• Item 10: This is the desired delay from sending of the mode message to the PE to reading of the first PE 
message during Schedule Update. The RPP can read the first message when the MCU issues the command, but 
the PE_OU needs 1 tick to send the mode message out. 
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• Item 11: This is the desired delay from sending of the mode message to the PE to reading of the first PE 
message during Schedule Update. The RPP can read the first message 1 tick after the MCU issues the 
command, but the PE_OU needs 2 ticks to send the mode message out. 


14.5. Failure-Recovery Specifications 


Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

1 

^FDlmax 

Local clock ticks 

Integer 

> 1 

> 1 

2 

Alr 

Local clock ticks 

Integer 

>0 

= 0 

3 

Astm 

Local clock ticks 

Integer 

> 1 

>8 


Notes: 

• Item 1: A FD l max is the delay to enter the local recovery. Although the SMU can assert the failure signal 
immediately, the MCU requires 1 tick to read the signal and switch to a node reset state. 

• Item 2: The local recovery operation is part of the Self-Test mode 

• Item 3: The Self-Test mode consists of a 1-tick node -reset state, a 1-tick idle state, at least 3 ticks for Self_Test 
commands, and 3 more ticks for the Reset command. 


14.6. Clique Detection Specifications 


Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

1 

Apd.OW 

Local clock ticks 

Integer 

> 1 

> 1 


Notes: 

• Item 1: Each Preliminary Diagnosis observation window must have a duration of at least one tick. 


14.7. Miscellaneous Specifications 


Item 

Number 

Variable 

Units 

Type 

Value 

Range 

RPP 

Constraint 

1 

Num_SPECIAL_Msg 

Unitless 

Integer 

>0 

= 13 


Notes: 

• Item 1: The number of SPECIAL messages is known and constant. 


14.8. Additional Constraints 

The previous tables present the basic value constraints for each variable according to its function in the 
generic timing model. In addition to this, some variables are constrained by relations to other variables. 
These additional constraints are presented next. 

During Schedule Update and PE Communication, the data introduction rate must not exceed the 


186 




capability of the Communication and Computation Modules. 


-A-SU — max(A C omm ? -A-Comp) 

(14.1) 

^SU, STREAM, P0-P2 1 min ^ max(Ac 0 mm ’ ^Comp) 

(14.2) 

•^•PE,sched — ITiax(Ac omm , Ac om p) 

(14.3) 

^PE,sched-acclmin — IYiax(Ac omm , Ac om p) 

(14.4) 

The protocol time triggers must not coincide with other scheduled events. This is expressed in terms 
of constraints on the time gaps between protocols, which indirectly constrain the valid values of T C d, T su , 
Tpe, and T$p. 

^CD, begin — 0 

(14.5) 

^SU, begin — 0 

(14.6) 

^PE, begin — 0 

(14.7) 

^SP, begin — 0 

14. (8) 


The duration of a resynchronization interval must be at least as large as the maximum duration of the 
Frame Synchronization protocol. This is expressed in terms of a constraint on p mm , which indirectly 
constrains the value of T SP 


Pmin — ^KsLnax (14.9) 

The duration of the Self-Test mode must meet the restart timing constraints. 

AstM - r (1 + Po)SoUADlmaxl ' ^LR (14.10) 

The duration of each observation window during Local Diagnosis Acquisition (a.k.a., Preliminary 
Diagnosis) must be at least as large as the worst case duration of a resynchronization interval. 

A pd ,ow>P (14.11) 
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15. Concluding remarks 


ROBUS is the central feature of SPIDER, a scalable family of modular avionics architectures that 
will support a range of application and reliability demands. ROB US -2 is a developmental version of 
ROBUS intended for laboratory experimentation and demonstrations of capabilities. The ROBUS 
Protocol Processor is a custom-designed hardware component that implements the ROBUS functionality. 
The RPP presented in this report implements the full functionality of ROBUS-2 as described in [Torres 
05]. In addition, the RPP design has the following features. 

• Full complement of error checks as described in [Torres 05] for RPP and bus failure detection. 

• Node kind and node Id can be specified pre-synthesis or at run-time. 

• Parameterized and synthesizable VHDL description with 73 behavioral and 24 structural 
parameters. 

• Maximum throughput of one ROBUS Message per clock tick for the Schedule Update and PE 
Communication protocols. At full speed, the bus can achieve over 90% processing efficiency for 
PE messages, measured as the maximum of number of PE message over the total number of clock 
ticks in a cycle. The overhead is due to the processing of the Collective Diagnosis, Schedule 
Update, and Synchronization Preservation protocols, in addition to the end-to-end bus latency. 

• Preliminary synthesis and place-and -route (PAR) results for a Xilinx Virtex-2 FPGA [XILINX 
05] indicate that the RPP can run at up to 50 MHz. For a 3-BIU by 3-RMU system with 17-bit 
ROBUS Messages (1 tag plus 16 payload), the RPP can process messages at the equivalent 
throughput rate of 850 Mbps (millions of bits per seconds). For an physical implementation, the 
actual throughput will be limited by the lower-level communication network and the interface 
between the communication module and the RPP. 

The full set of timing equations presented in this report has been implemented in a MATLAB 
script. The process of building a particular version of the RPP consists of the following steps: (1) 
Determine the value for the variables listed in Section 14 of this report; (2) Run the MATLAB scripts 
to compute the VHDL behavioral and structural parameters; (3) Set the parameter values in the 
appropriate VHDL package file; and (4) Run synthesis and PAR for the target technology. 


189 




Appendix A. Detailed specification for the RPP Input Unit 


The following are the requirements for the RPP Input Unit (IU). These requirements are intended to 
be used to design of a synthesizable VHDL description. Abstract data types of scalar (e.g., Boolean, 
Natural, enumerated) and composite (e.g., vector) forms are used to capture the requirements. These 
types are built-in or easily expressed in the VHDL language. Lower-level circuit design matters is 
assumed to be handled appropriately to ensure compliance with the stated requirements. The target 
technology are field-programmable gate arrays (FPGA) running at a minimum clock frequency of 5 MHz, 
which should be easily achievable with current technology. The maximum allowed clock frequency is 
determined by the implementation. The required parameters, both structural (e.g., number of input 
channels) and behavioral (e.g., time to wait before asserting a signal), are introduced as needed. 

The Input Unit (IU) requirements are presented as follows. First, the high-level tasks allocated to the 
IU are listed for each ROBUS protocol. Then, the basic IU functions are described conceptually. Next, 
the IU interface is described in detail. This is followed by a detailed description of the reception modes 
for nominal, error-free cases. The generation of error syndromes is addressed after that. Then, the 
required response for each MCU command is presented, including the functions to be used and timing 
descriptions. The next section lists all the predefined structural and behavioral parameters relevant to the 
design of the IU. Depending on the actual design of the IU, additional structural parameters may be 
identified. The last section presents additional synthesis-related requirements and various miscellaneous 
remarks. 


A.l. ROBUS tasks allocated to the Input Unit 

Table A.l lists the tasks allocated to the Input Unit. 


A.2. Basie IU functions 

The RPP Input Unit serves as an interface between the Communication Module receivers and the 
computation elements of the RPP. The structural parameter Num_OK specifies the number of receivers 
connected to the IU. For this version of the RPP, it is assumed that the signals from the receivers are 
synchronous with respect to the clock signal generated by the physical oscillator, which is denoted by 
CLK. The input buffering function is implemented using synchronous first-in first-out (FIFO) buffers. If 
the outputs of the link receiver are not synchronous with respect to the oscillator clock signal, then a 
signal synchronizer must be used. Figure A. 1 illustrates this configuration. Note that the logic before the 
input buffer is considered part of the Communication Module. 


Communication ^ ^ RPP 

Module Input Unit 



CLK FIFO 

Figure A.l: Reception path for ROBUS nodes with a link receiver with asynchronous outputs 
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Table A. 1: Tasks allocated to the Input Unit 


Major Mode 

Minor Mode 

Protocol 

Process and 
Node Kind 

IU Tasks 

Clique 

Collective 

Collective 

PO: BIU & RMU 

No applicable tasks 

Preservation 

Diagnosis 

Diagnosis 

PUBIU&RMU 

Receive 1 synchronous-communication message from each opposite kind node; 
The execution of this process does not overlap with process P2. 




P2: BIU & RMU 

Receive 1 synchronous-communication message from each opposite kind node; 
The execution of this process does not overlap with process P3. 




P3: BIU & RMU 

Receive 1 synchronous-communication message from each opposite kind node; 
The execution of this process does not overlap with process P4. 




P4: BIU & RMU 

Receive 1 synchronous-communication message from each opposite kind node 


Schedule Update 

Schedule Update 

PO: BIU 

No applicable tasks 




PI: RMU 

Receive N synchronous-communication messages from each opposite kind 
node; Constant DII between messages; 

The execution of this process does not overlap with process P3. 




P2: BIU 

Receive N synchronous-communication messages from each opposite kind 
node; Constant DII between messages; 

The execution of this process does not overlap with process P4. 




P3: RMU 

Receive N synchronous-communication messages from each opposite kind 
node; Constant DII between messages 





Load the protocol results and assess the new schedule 




P4: BIU 

Receive N synchronous-communication messages from each opposite kind 
node; Constant DII between messages 





Load the protocol results and assess the new schedule 


PE 

PE Broadcast 

PO: BIU 

No applicable tasks 


Communication 


PI: RMU 

Receive K sche d synchronous-communication messages from each opposite kind 
node; Constant DII between messages 




P2: BIU 

Receive K sche< i synchronous-communication messages from each opposite kind 
node; Constant DII between messages 



Accusation 

PO: BIU 

No applicable tasks 



Exchange 

PI: RMU 

Receive 1 synchronous-communication messages from each opposite kind node; 
Same DII as for PE Broadcast 




P2: BIU 

Receive 1 synchronous-communication messages from each opposite kind node; 
Same DII as for PE Broadcast 


Synchronization 

Synchronization 

PO: BIU 

No applicable tasks 


Preservation 

Preservation 

PI: RMU 

Receive 1 INIT fixed-delay communication message from each opposite kind 
node; The execution of this process does not overlap with process P3. 
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Major Mode 

Minor Mode 

Protocol 

Process and 
Node Kind 

IU Tasks 


Collective 

Diagnosis 

Collective 

Diagnosis 

PO - P4: BIU & 
RMU 

Same as for Clique Preservation major mode 

Clique 

Detection 

Local Diagnosis 
Acquisition and 
Synchronization 
Acquisition 

Local Diagnosis 
Acquisition 

PO: BIU & RMU 

Receive an indefinite number of messages in asynchronous-monitoring 
communication mode during two consecutive intervals of equal and 
predetermined duration. 

Frame 

Synchronization 

PO: BIU & RMU 

Execute the Frame Synchronization protocol 

Receive an indefinite number of messages in asynchronous-monitoring 
communication mode for the duration of the protocol 

Synchronization 

Capture 

PO: BIU & RMU 

Receive 1 ECHO fixed-delay communication message from each opposite kind 
node 

Receive an indefinite number of messages in asynchronous-monitoring 
communication mode for the duration of the protocol 

Collective 

Diagnosis 

Acquisition 

Collective 

Diagnosis 

Acquisition 

PO - P4: BIU & 
RMU 

Same as Collective Diagnosis for the Clique Preservation major mode 

Self-Test 

Self-Test 

Self-Test 

PO: BIU & RMU 

No applicable tasks 

Reset 

Reset 

PO: BIU & RMU 

Reset the Input Unit 
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The IU handles all timing aspects related to the reception of messages. The IU implements the 
reception functions to support the point-to-point communication modes: synchronous, fixed-delay, and 
asynchronous-monitoring. The IU implements the frame-synchronization function, which is essentially a 
computation-activation function based on the timing of received synchronization messages. The IU also 
implements the schedule processing function, which is responsible for storing the PE communication 
schedule, performing an assessment to determine its validity, and converting the schedule into signals 
suitable for control of the computation pipeline. The IU location at the top of the RPP data-processing 
path makes it the ideal module to control the timing of the computation pipeline. The timing of the IU 
control outputs is referenced to the local time, events generated within the RPP, or received events. The 
IU is also responsible for reading the communication error signals from the link receivers, performing in- 
line timing-error checks on received messages, and generating error syndromes for diagnostic processing 
within the RPP. 


A.2.1. Synchronous reception 

This function is presented mainly from the perspective of a particular point-to-point communication 
channel. However, the description applies to every IU reception channel. 

Synchronous communication is used with the synchronous protocols: Collective Diagnosis, Schedule 
Update, PE Broadcast, Accusation Exchange, and Initial Diagnosis. For this point-to-point 
communication mode, the local-time clocks of the source and receiver are assumed to be synchronized 
within a particular precision bound. There is also a nominal reception delay, denoted by R PP , with a 
known error bound. Figure A.2 illustrates the relevant timing events for synchronous reception. T REF is a 
local-time reference selected to coordinate the message transmission and reception actions. A message 
sent at local time T SND is nominally expected to arrive R PP ticks later at local time T RCVi e. With the known 
synchronization precision bound and the known error in R PP , it is possible to compute the maximum error 
for Trcv.k, denoted A PP , RC vlabs-max, for a message sent by a properly operating source. Thus, the RPP at the 
receiving node expects to receive the synchronous message at any of the triggering edges of the local-time 
clock in the interval: 

[T R cv,e ■ A PPjR cvlabs-max> T R cv,E + A PPR cvlabs-max] ( A. 1 ) 


R P 


i <- 


L REF 1 SND 


W R 


Local Time at sender 
Expected-reception interval 


i<- 


-> i 


A 


l RCV.E 


A 


Local Time at the receiver 


Trcv,E - A PP RC vlabs-max T RC v,e + A PP RC vlabs-max + 1 

Figure A.2: Timeline for synchronous reception 
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This interval corresponds to the triggering edges in the local-time interval: 

[Trcv ,E ‘ ApP,RCvlabs-maxi Trcv.E + ^PP,RCvlabs-max + 1) (A.2) 

This expected-reception interval is also called the reception window. W RCV denotes the duration of 
the reception window: 

Wrcv — 2A PPj RCvlabs-max + 1 (A.3) 

The send and receive actions must be coordinated to ensure that the left edge of the reception window 
does not occur before T REF , i.e., T RC v,e - App RcvUs-max - T REF . In addition, the minimum delay required by 
the source to send the message and the minimum delay required by the receiver to open the reception 
window, both measured with respect to T REF , must be taken into consideration to determine T RC v,e- 
During the reception window, the receiver expects to receive exactly one message from the source. The 
received message is held by the input buffer until the scheduled time for processing, denoted by T PRO c, 
which must satisfy the following constraint. 

TpROC ^ Trcv.E + AppRCvhbs-max + 1 (A. 4) 

In order to reduce the PE-to-PE latency of the bus, the processing of synchronous messages will 
always be scheduled to begin immediately after the closing of its corresponding expected-reception 
interval. That is: 


TpROC — Trcv.E + App,RCvlabs-max + 1 


(A.5) 


The IU must be able to handle synchronous message streams consisting of a known number of 
messages with a nominal spacing between them. Consider a generic synchronous-message stream. K strea m 
denotes the total number of messages in the stream. A strea m denotes the nominal data-introduction interval 
(DII) (i.e., the inverse of the nominal message rate) for the stream, measured in units of local clock ticks. 
The spacing between the expected-reception intervals for the messages is a function of A PP , R cvlabs-max and 
Astream- For a given A FP RCV labs-max- as A stream is reduced, the intervals get closer until they eventually abut 
and then overlap. This scenario of overlapping reception windows is handled by defining an overall 
reception window that applies to the entire synchronous message stream. 


T rcv.h.i denotes the expected time of reception for stream message i, with 0 < i < K stream - 1. The 
expected-reception interval for the i-th message is: 

[Trcv.E.O iA stre am " ^PP.RCvLbs-inax- Trcv,E,0 "t" 1 A slrearn -f- AppRcvlabs-max "t" 1 ) (A.6) 


The scheduled time to begin processing message i, denoted by T FRO c.„ is given by: 


TpROC, i — Trcv.E.O + iAstream + App,RCvlabs-max + 1 


(A.7) 


For A s t re am ^ 2 A FF . RC vUs-max + 1 , the expected-reception intervals are considered to form a single overall 
window that applies to the entire stream. The reception window for this case is: 


[Trcv.E.O ■ AppRCvlabs-max? Trcv.E.O + (Kstream " 1 )A st ream + App,RCvlabs-max + 1 ) (A. 8) 

The minimum required size for the input buffer (i.e., the minimum number of messages that it must be 
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able to hold) is a function of A PP RCV l a bs-max and A stream . This minimum buffer size, or depth, is an IU 
structural constraint. 


A.2.2. Fixed-delay reception 

This function is presented mainly from the perspective of a particular point-to-point communication 
path. The description applies to every IU reception channel. 

Fixed-delay communication is exclusively used with the INIT and ECHO messages of the 
synchronization protocols: Synchronization Preservation, Synchronization Capture, and Initial 

Synchronization. Because the information of main interest is in the time of reception of the messages, the 
design of the IU must ensure that its outputs properly preserve this information. 

Each received synchronization message is characterized by two items of information: the time of 
reception and the content of the message. Correspondingly, two types of IU outputs are used with fixed- 
delay reception: timing pulses and message content. A timing pulse is generated a fixed delay after the 
reception of an expected synchronization message. The message content is stored in the input buffer 
upon reception and forwarded for processing some time later. 

Two sub-modes of fixed-delay reception are defined based on whether the reception window has a 
predetermined duration or not. Fixed-delay reception with a predetermined reception window size is 
used with the Synchronization Preservation protocol, for which it is possible to define a reception window 
for each protocol stage. For this reception mode, the generation of the timing pulses and the output of the 
message content are coordinated with respect to the reception window. The reception window can be 
used as a reference for timing-error detection similarly to the way done for single-message synchronous 
reception. Fixed-delay reception without a predetermined reception window size is used with the 
Synchronization Capture and Initial Synchronization protocols. For these protocols, there are no 
dedicated synchronization-message reception windows, and there is no coordination between the 
generation of timing pulses and the output of message content. 

Figure A.3 illustrates relevant events for fixed-delay reception with a reception window of 
predetermined size. The reference time for the reception window, denoted by T REF , is based either on the 
local time or on locally generated synchronization events, depending on the protocol process. The error 
detection and voting functions for synchronization messages in the Route-and-Vote Unit (RVU) require 
that the IU generate two pulses for each input channel. The pulses are referred to as the short pulse and 
the long pulse. Both pulses are delayed by a predetermined amount with respect to the time of reception 
of the synchronization message during the reception window. The delay and duration of the pulses are 
determined by IU behavioral parameters. The pulses are generated only if a message with the expected 
content (which can be either INIT or ECHO, depending on the protocol processes being executed) is 
received during the reception window. The pulses are delayed with respect to the first message received 
with the expected content. At most one pair of short and long pulses is generated for each reception 
window. For each reception window, the IU performs empty, overload, and overrun error checks. The 
content of the received message is held in the input buffer until the time for processing, which is at the 
end of the reception window. 
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Reception window 

_A_ 


Message reception 
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a 






Begin message-content processing 


Local Time 


^delay 


A,h 


Short pulse 


^long 


Long pulse 


Figure A. 3: Timing pulses for a synchronization message with a reception window of predetermined size 


For fixed-delay reception without a predetermined reception window size, the short and long pulses 
are triggered by the reception of an expected synchronization message, and the delay and duration of the 
pulses are the same as for the case of reception with dedicated reception windows. Because the Initial 
Synchronization protocol does not preclude the possibility of receiving valid and nearly simultaneous 
1NIT and ECHO messages, the IU must have separate outputs for INIT and ECHO pulses. Once the 
process of generating the INIT pulses is triggered by the reception of a valid message, any new received 
INIT message is simply ignored. A similar constraint applies to ECHO messages. For this fixed-delay 
reception case, the content of received synchronization messages is handled using the asynchronous- 
monitoring reception. 


A.2.3. Asynchronous-monitoring reception 

Asynchronous monitoring is performed during Local Diagnosis Acquisition and Synchronization 
Acquisition in the Clique Detection major mode, and during Initial Synchronization in the Clique 
Initialization major mode. The reception intervals during which asynchronous monitoring is performed 
are called observation windows (or intervals). During an observation window, each received message is 
stored for one clock tick in the corresponding IU input buffer and then it is forwarded to the IDU, where 
the actual input-error monitoring takes place. In the Clique Detection mode, the observation window is 
open upon transitioning to Local Diagnosis Acquisition, and it remains open until the RVU computes the 
reference synchronization event during Synchronization Capture. At that point, the IU closes the 
observation window and the IDU generates its final monitoring results. In the Clique Initialization mode, 
the observation window is open at the start of the Initial Synchronization protocol and it is closed when 
the RVU computes the final synchronization reference event. 


A.2.4. Frame synchronization 

The Frame Synchronization function finds the gap between consecutive executions of the 
Synchronization Preservation protocol and triggers the activation of the Synchronization Capture 
protocol. Figure A.4 presents the description of the process. 
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1. Wait until the IDU readies the set of initial eligible sources. 

2. Load the initial eligible sources and start the gap timer. 

3. Loop: 

3.1. If an error is detected, remove the corresponding source from the eligible 
sources. 

3.1.1. Communication checks for each opposite-kind source: 

3. 1 . 1 . 1 . Expecting no link errors 

3.1.2. In-line checks for each opposite-kind source: 

3. 1.2.1. At most one ECHO message expected during the execution of this loop. 

3.2. If an ECHO message is received from an eligible source, then restart the gap 
timer. Else, if gap timer has expired, then assert Done and exit. 


Figure A.4: Frame Synchronization process 

Major steps 1, 2, and 3 describe separate activities performed on a clock-tick basis. During step 1, the 
protocol waits until the Input Diagnostics Unit (IDU) determines the initial eligible voters based on the 
observations during Local Diagnosis Acquisition. During step 2, the protocol loads the eligible sources 
and starts the gap timer. The “gap timer” is a time-out timer with a predetermined time-out threshold. 
Step 3 is executed continuously from the second tick on until the process completes and the Done signal 
is asserted. Steps 3.1 and 3.2 are executed sequentially every tick. In step 3.1, the eligible sources are 
updated based on the listed error checks. The communication checks are performed by the link receivers 
of the Communication Module. The definition of these checks is determined by the design of the 
communication links. The in-line check detects an error when a total of two or more ECHO messages are 
received from any opposite -kind source during the execution of the loop. In step 3.2, the gap timer is 
restarted every time an ECHO message is received from an eligible source. Note that if an ECHO 
message is received from a currently eligible source but an error is simultaneously detected for that 
source, then the source is declared ineligible and the message is not allowed to trigger a restart of the gap 
timer. If the gap timer expires and no new ECHO messages are received from eligible sources, then the 
Done signal is asserted. The expiration of the gap timer diggers the activation of the IU fixed-delay 
reception mode to handle synchronization messages for the Synchronization Capture protocol. The errors 
detected during the execution of the Frame Synchronization process become an error-syndrome output by 
the IU upon completion of the process. 


A.2.5. Schedule processing 

The communication schedule specifies the number of messages to be transmitted by each PE during 
the PE Communication mode. The IU schedule processing function consists of three elements: load the 
results of the schedule update, assess the new schedule, and process the active schedule to condol the 
computation pipeline during the PE Communication mode. 

Let N denote the number of BIUs, which is assumed to equal the number of PEs attached to the bus. 
The Schedule Update protocol is applied N times in order to determine the number of messages to be 
broadcast by each PE. For each PE, the result of the Schedule Update protocol can be a DATA message 
with the number of scheduled messages in the payload field, or a PE_ERROR message indicating that the 
protocol was unable to determine a valid number of messages. The IU loads these results from the RYU 
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as a series of N consecutive messages sorted by PE identification number in ascending order. 

The assessment of the schedule update produces one of three results: invalid, valid, or zero. A 
schedule is invalid if the result of the Schedule Update protocol is PE_ERROR for any PE, or if the total 
number of scheduled messages exceeds the maximum number of PE messages that the ROBUS can 
process during the PE Communication mode. A schedule is valid if it is not invalid. A zero schedule is a 
special case of a valid schedule in which the number of scheduled messages is zero for every PE. The IU 
must report to the MCU the result of the schedule assessment. 

The activity during the next PE Communication mode depends on the result of the schedule update 
assessment. If the result is valid, the new schedule is used. If the result is zero, there will be no bus 
activity. If the result is invalid, a default schedule is used. The default schedule is constant during run- 
time and is known to all the ROBUS nodes and the PEs. For this version of the ROBUS, the default 
schedule allocates the same number of transmissions for each PE. 


A.2.6. Pipeline control 

The IU controls the transfer of data to the computation pipeline. The timing of the IU control outputs 
is referenced to the local time, events generated within the RPP, or received events. For diagnostic 
purposes, the IU must also indicate when it has opened a reception or observation window. For 
synchronous reception, the IU must signal when it is time to begin processing the received messages. The 
IU must signal when it has enabled fixed-delay reception for synchronization messages. For 
asynchronous-monitoring, the IDU needs to know when each new message is received. 

In addition to timing control, for some protocols the IU must also provide information about the nature 
of the data. For example, if a message stream is being received, the IU must signal when the end of the 
stream has been reached. During PE Communication mode, the IU must indicate which PE is the source 
for each scheduled message, and when the stream is switching from PE Broadcast messages to 
Accusation Exchange messages. 


A.2.7. Error-syndrome generation 

The IU generates six types of error syndromes based on the error checks performed. The IU monitors 
the error signals from the link receivers and generates corresponding syndromes. The IU generates error 
syndromes based on the checks performed during the execution of the Frame Synchronization protocol. 
The IU performs empty, overload, and overrun checks and generates corresponding syndromes when 
receiving messages during windows with predetermined size. Between reception windows, when no 
messages are expected, the IU monitors the inputs for the arrival of unexpected messages and generates 
corresponding error syndromes. 


A.3. Interface 

Figure A.5 shows the signal groups for the Input Unit interface. 
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Oscillator Clock 
MCU Command 
Inputs from the Communication Module 
RVU Synchronization Events 
RVU Processing Results 
IDU Node Diagnosis 



Received Synchronization Events 
Received Message Content 
Pipeline Control 
Error Syndromes 
Immediate Error Syndromes 
Schedule Assessment 


Figure A.5: Input Unit interface: signal groups 


Table A.2 describes the interface signals for the Input Unit. The “Direction” column identifies the 
signal as an input (i.e., generated by another unit) or an output (i.e., generated by the IU). The “Type” 
column specifies the structure and value set for each signal. The basic scalar types are the following. 

• Bit: value set {0, 1 } 


• Boolean: value set {TRUE, FALSE} 

• Natural: value set { non-negative integers in the range indicated in the table } 

• Enumerated: value set {as listed in the table} 

The composite types are the following. 

• Boolean Vector (BV): a one -dimensional array of Boolean elements addressed by a Natural-type 
index in the specified range. Type BV(a .. b) corresponds to a Boolean Vector with index in the 
Natural value range a to b. 

• Boolean Vector of Vectors (BVV): a two-dimensional array of Boolean elements addressed by 
row and column Natural-type indexes. Type BVV(a .. b)(c .. d) corresponds to a Boolean Vector 
of Vectors with row index in the Natural value range a to b and column index in the Natural value 
range c to d. 

• ROBUS Message (RM): a two-element record: (Tag, Payload). See XXX for a detailed 
description of value sets for the Tag and Payload fields. 

• ROBUS Message Vector (RMV): a one-dimensional array of ROBUS Message elements 
addressed by a Natural-type index in the specified range. Type RMV(a .. b) corresponds to a 
ROBUS Message Vector with index in the Natural value range a to b. 

The function max(a,b) returns the largest of the Natural- valued variables a and b. In addition, the 
following symbols are used: N = number of BIUs; Q. = number of nodes of the opposite kind; and L ls = 
number of syndrome bits from each ROBUS link receiver. 

There is no required default value for some IU output signals. This means that an implementation 
does not need to have a particular default value for those signals. If having a default value is desired, then 
any arbitrary value within the value set of the signal may be chosen. Arbitrary default values are 
indicated in Table A.2 as “don’t care”. 
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Table A.2: Input-Unit Interface 
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Signal Group 

Signal Name 

Direction 


Description 


MCU_Schedule_Status 

Input 

Enumerated: 

(Valid, Zero, Invalid) 

• Assessment result for latest schedule update 

• The value of this signal is valid only when MCU_Ready is 
asserted. 


MCU_Output_Enable 

Input 

Boolean 

• Enable control for the RPP broadcast output (i.e., output to 
opposite-kind nodes) 

• Asserted positive (i.e., TRUE when the output is enabled) 

• The value of this signal is valid only when MCU_Ready is 
asserted. 


MCU_Ready 

Input 

Boolean 

• Indicates when the MCU is issuing a new mode command 

• Asserted positive (i.e., TRUE when a new command is 
available) 

Inputs from the 
Communication 
Module 

CM_Message_In 

Input 

RMV(1 .. Q) 

• Message content output by the link receivers 

• Each vector element is independent from the others 

• The value of signal CM_Message_In(i) for input channel i 
is valid only when CM_Strobe_In(i) is asserted. 


CM_Syndrome_In 

Input 

BVV(1 .. 0)(1 .. L ls ) 

• Error syndromes generated by the link receivers 

• Each element is independent from the others 

• Asserted positive (i.e., TRUE when an error is detected) 

• The value of signal CM_Syndrome_In (i) for input channel 
i is valid only when CM_Strobe_In(i) is asserted. 


CM_Strobe_In 

Input 

BV(1 .. 12) 

• Indicates when a link receiver has a new message or is 
reporting the detection of an error 

• When signal CM_Strobe_In(i) is asserted, the IU always 
reads both CM_Message_In(i) and CM_Syndrome_In(i). 

• Each vector element is independent from the rest 

• Asserted positive (i.e., TRUE when a new input is 
available) 

• The value of this signal is always valid, except when 
MCU_Node_Reset is asserted. 

RVU Synchronization 
Events 

RVU_Accept_INIT 

Input 

Boolean 

• Output of the Accept(INIT) function 

• Asserted positive (i.e., TRUE when Accept is triggered) 


RVU_Accept_ECHO 

Input 

Boolean 

• Output of the Accept(ECHO) function 

• Asserted positive (i.e., TRUE when Accept is triggered) 

RVU Processing 
Results 

RVU_Transform_Result 

Input 

RM 

• Result of processing the content of received messages 

• The value of this signal is valid only when RVU_Ready is 
asserted. 
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Signal Group 

Signal Name 

Direction 


Description 


RVU_Ready 

Input 

Boolean 

• Indicates when the RVU has a new content result 

• Asserted positive (i.e., TRUE when a new result is 
available) 


RVU_Last_Message 

Input 

Boolean 

• Indicates when a result is the last in a sequence 

• Asserted positive (i.e., TRUE when a result is the last one) 

• The value of this signal is valid only when RVU_Ready is 
asserted. 

IDU Node Diagnosis 

IDU_Invalid_Input 

Input 

BV(1 .. Q) 

• Result of error detection performed on the received 
messages 

• Each vector element is independent from the others 

• Asserted positive (i.e., TRUE when the corresponding input 
channel is invalid or untrustworthy) 

• The value of this signal is valid only when IDU_Ready is 
asserted. 


IDU_Ready 

Input 

Boolean 

• Indicates when the IDU has a new result 

• Asserted positive (i.e., TRUE when a new result is 
available) 

Received 

Synchronization 

Events 

IU_Init 

Output 

BV(1 .. O)(0.. 1) 

• Timing pulses for received INIT synchronization messages 

• Each row is independent from the others 

• For input channel i, IU_Init(i)(0) = short pulse and 
IU_Init(i)(l) = long pulse 

• Asserted positive (i.e., TRUE during a pulse) 

• The value of this signal is valid only during the execution of 
the synchronization protocols. 

• Default value: Don’t Care 


IU_Echo 

Output 

BV(1 .. O)(0.. 1) 

• Timing pulses for received ECHO synchronization 
messages 

• Each row is independent from the others 

• For input channel i, IU_Init(i)(0) = short pulse and 
IU_Init(i)(l) = long pulse 

• Asserted positive (i.e., TRUE during a pulse) 

• The value of this signal is valid only during the execution of 
the synchronization protocols. 

• Default value: Don’t Care 
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Signal Group 

Signal Name 

Direction 


Description 


IU_Last_Message 

Output 

Boolean 

• Indicates that the last message in a stream has been reached 
during Schedule Update and PE Communication mode 

• Asserted positive (i.e., TRUE when the last message has 
been reached) 

• The value of this signal is valid only when IU_Ready is 
asserted during Schedule Update and PE Communication. 

• Default value: Don’t Care 

Error Syndromes 

IU_Unexpected_Message 

Output 

BV(1 .. Q) 

• Indicates that an unexpected message has been received 

• Each vector element is independent from the others 

• Asserted positive (i.e., TRUE when an error is detected) 

• The value of signal IU_Unexpected_Message (i) is valid 
only when IU_Receiving is not asserted and 
IU_Strobe_Out(i) is asserted. 

• Default value: Don’t Care 

IU_Link_Error 

Output 

BVV(1 .. 0)0 .. L ls ) 

• Indicates that a link error has been reported by the 
Communication Module receivers 

• The output of this syndrome is coordinated with the output 
of message content. 

• Each element is independent from the others 

• Asserted positive (i.e., TRUE when an error is detected) 

• The value of signal IU_Link_Error(i) is valid only when 
IU_Receiving is asserted and, depending on the reception 
mode, when IU_Strobe_Out(i) or when IU_Ready is 
asserted. 

• Default value: Don’t Care 

IU_Empty_Buffer 

Output 

BV(1 .. Q) 

• Indicates that the input buffer is unexpectedly empty 

• The output of this syndrome is coordinated with the output 
of message content. 

• Each element is independent from the others 

• Asserted positive (i.e., TRUE when an error is detected) 

• The value of this signal is valid only when IU_Receiving is 
asserted and IU_Ready is asserted. 

• Default value: Don’t Care 


206 





207 




Signal Group 

Signal Name 

Direction 


Description 


IU_Imm_Buffer_Overload 

Output 

BV(1 .. O) 

• Indicates that a link error has been reported by the 
Communication Module receivers 

• The output of this syndrome is not coordinated with the 
output of message content. 

• Each element is independent from the others 

• Asserted positive (i.e., TRUE when an error is detected) 

• The value of signal IU_Imm_Link_Error(i) is valid when 
IU_Strobe_Out(i) is asserted. 

• Default value: Don’t Care 

Schedule Assessment 

IU_Zero_Schedule 

Output 

Boolean 

• Indicates when the number of scheduled messages for the 
lastest schedule update is zero. 

• Asserted positive (i.e., TRUE when there are no scheduled 
messages) 

• The value of this signal is arbitrary during Schedule Update 
until the assessment of the new schedule is complete. At all 
other times, the value is constant and valid. 

• Default value: Don’t Care 


IU_Invalid_Schedule 

Output 

Boolean 

• Indicates when the lastest schedule update is invalid. 

• Asserted positive (i.e., TRUE when the downloaded 
schedule is invalid) 

• The value of this signal is arbitrary during Schedule Update 
until the assessment of the new schedule is complete. At all 
other times, the value is constant and valid. 

• Default value: TRUE 
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A.4. Message reception 

This section provides a detailed description of the reception modes for nominal, error-free cases. The 
following basic conventions are adopted in the timing diagrams. Other diagrammatic conventions are 
explained as needed. 

• The Clk signal is represented by tick marks on a time axis indicating the timing of the triggering 
edges (i.e., 0 — > 1 transitions). 


Clk 


triggering edge 

j i i i i i i i •_*. 


• For scalar Boolean variables, a value of TRUE is represented by an unlabeled gray box and a 
value of FALSE by an empty section. 

FALSE TRUE 

\k \k 

Boolean signal I I 

• For scalar and vector variables, a labeled gray box is used to indicate that the signal has a certain 
value, which may or may not be known a priori but is not arbitrary. 

X = A 

Signal X E 

• A crosshatched section in the waveform of a particular signal indicates that the signal may have 
an arbitrary value (i.e., a “don’t care”) during that section of the waveform. 

Arbitrary values 

^ 2A 

Signal X I I I I 


A.4.1. Synchronous reception 

Synchronous reception is characterized by the use of time-referenced expected-reception intervals. 
The Input Unit handles this reception mode using a uniform processing model whereby messages are 
always part of a stream of known size, which can be greater than or equal to one. Each synchronous 
message has a corresponding expected-reception interval of predetermined duration. The size of the 
message reception interval for the Initial Diagnosis protocol is specified by parameter ID_Wnd_Sz. For 
the other synchronous protocols, the size of the reception interval for each expected message is given by 
parameter Syncns_Wnd_Sz. 

Figure A. 6 shows an example of the timing for synchronous reception of a one-message stream. For 
illustration puiposes only, the duration of an expected-reception interval for a synchronous message is 
denoted by W RCV , which is measured in units of local-clock ticks. W RCV = 5 is an arbitrary value chosen 
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for illustrative purposes and is not meant to represent a typical value. A single generic input channel i is 
shown. The IU expects to receive a message from each opposite -kind source during the reception 
interval. Only messages that arrive during the reception interval are accepted. IU_Receiving is set TRUE 
to indicate that the IU is expecting to receive a message. Note that IU_Receiving is delayed by one clock 
tick with respect to the expected-reception interval. The assertion of IU_Ready indicates that the end of 
the reception interval has been reached and the result is ready for processing. The IU_Strobe_Out(i) is 
always asserted one tick after a message is received. 
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Figure A.6: Synchronous reception of a one-message stream with W R cv = 5 


There are two reception cases for multi-message streams. In the simplest case, the messages have 
expected-reception windows that do not overlap or coincide end-to-end. Figure A.7 shows an example of 
reception for a three-message stream (i.e., K = 3) with W RC v = 5 and DII = 7. As can be seen, there is a 
separate reception window for each message and the reception activity for each is the same as for the case 
of reception of a one-message stream. Note that IU_Strobe_Out(i) is asserted one tick after a message is 
received, and IU_Ready indicates when the result is ready for processing. 
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Figure A.7: Synchronous reception of a multi-message stream with separate reception windows 

(K = 3, W RCV = 5, DII = 7) 


210 



The second case of multi-message stream reception involves expected-reception intervals that overlap 
or coincide at the ends. For this case, the messages are received with a single continuous reception 
window. All the messages are expected to arrive within this window with an average DII approximately 
equal to the nominal value. Figure A. 8 shows an example of reception for a stream of three messages 
(i.e., K = 3) with W RCV = 5 and DII = 3. The IU outputs the messages at the times corresponding to the 
end of their respective expected-reception intervals, similarly to the way it is done for the other 
synchronous reception cases. 


Reception window 

W RCV DII DII 

I — — I 1 1 

Clk 

CM_Message_In(i) 

CM_Strobe_In(i) 

IU_Message_Out(i) 

IU_Strobe_Out(i) 

IU_Receiving 
IU_Ready 

Figure A. 8: Synchronous reception of a multi-message stream with a continuous reception window 

(K = 3, W RCV = 5, DII = 3) 
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A.4.2. Asynchronous-monitoring reception 

In this reception mode, the IU receives messages without knowing in advance exactly when the 
observation window will end. Signal IU_Receiving remains high during the observation window. The 
IU closes the observation window upon detection of a certain expected event generated locally within the 
RPP. The IU is required to output every received message with exactly one tick delay from the time of 
reception. Figure A.9 shows a reception example for an arbitrary input pattern. 
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Figure A.9: Asynchronous-monitoring reception 
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A.4.3. Fixed-delay reception 

Fixed-delay reception is used only with the 1NIT and ECHO messages used in the synchronization 
protocols. For this reception mode, the IU outputs the timing of reception separately from the content of 
the received messages. There are two cases of fixed-delay reception, depending on whether a 
predetermined expected-reception interval is defined or not. 

For fixed-delay reception with an expected-reception interval of predetermined size, the location of the 
intervals is referenced to the local time or to local events generated within the RPP during the execution 
of the Synchronization Preservation protocol. During a particular reception interval, only a single 
synchronization message with a specific content is expected from each input channel. The size of the 
reception intervals varies depending on the synchronization protocol process being executed. The size of 
the reception windows is specified by the behavioral parameters SP_INIT_Wnd_Sz(MCU_Node_Kind) 
for INIT messages and SP_ECHO_Wnd_Sz(MCU_Node_Kind) for ECHO messages. Note that these 
parameters are a function of the node kind, which is specified by signal MCU_Node_Kind. Figure A. 10 
shows an example of fixed-delay reception with an expected-reception interval of predetermined size. 
W RC v = 5 is an arbitrary value chosen for illustrative puiposes only. The shown delay and the duration of 
the short and long pulses are also arbitrary. For this particular example, only one INIT message is 
expected during the reception window. The IU_Init(i) pulses are only triggered by the reception of the 
first INIT message. Any additional messages received from opposite-kind source i during the reception 
window do not trigger additional pulses. If no INIT messages are received during the window, then the 
pulses are not generated. Notice that the content of the message is output after the end of the reception 
interval, similarly to the way it is done for synchronous reception. 
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Figure A. 10: Fixed-delay reception with a predetermined reception window size (W RCV = 5) 

When fixed-delay reception without a predetermined expected-reception interval is used, the IU is 
receives messages without knowing in advance exactly when the reception or observation window will 
end. At a specific point in time, the IU enables a pulse-generation function to start monitoring the inputs 
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for INIT or ECHO messages. The short and long pulses are triggered by the reception of synchronization 
messages. The content of the messages is handled according to the asynchronous-monitoring reception 
rules. Figure A. 11 shows an example for an ECHO message. The pulse-generation function generates 
one set of pulses when the expected message is received (in this example, an ECHO message). After that, 
no additional pulses are generated. If the expected message is not received, then the pulses are not 
generated. 
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Figure A.l 1: Fixed-delay reception without a predetermined reception window size 


The delay and duration of the pulses are independent of the protocol being executed. For INIT 
messages, the pulses are generated based on the IU behavioral parameters 

INIT_Pls_Dly(MCU_Node_Kind) and lNIT_Skw(MCU_Node_Kind), both of which are functions of the 
local node kind. 


• AdelayliNiT = INIT_Pls_Dly(MCU_Node_Kind) (A.9) 

• A short li N iT = INIT_Skw(MCU_Node_Kind) + 1 (A. 10) 

• A long l INIT = 2*INIT_Skw(MCU_Node_Kind) +1 (A. 1 1) 

The pulses for ECHO messages are generated based on the IU parameters 
ECHO_Pls_Dly(MCU_Node_Kind) and ECHO_Skw(MCU_Node_Kind). 

• A delay l ECH0 = ECHO_Pls_Dly(MCU_Node_Kind) (A. 12) 

• A short l ECHO - ECHO_Skw(MCU_Node_Kind) +1 (A. 1 3) 

• A long l ECH o = 2*ECHO_Skw(MCU_Node_Kind) + 1 (A. 14) 
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A.5. Error-syndrome generation 


This section describes the generation of the error syndromes. 


A.5.1. IU_Unexpeeted_Message 

An unexpected-message error syndrome is asserted for a particular input channel whenever a message 
is received while the IU is not expecting to receive messages (i.e., when there is no active reception or 
observation window). This type of error is signaled at the outputs of the IU one tick after the message is 
received. Figure A. 12 shows various reception cases relative to reception or observation windows. The 
value of syndrome IU_Unexpected_Message(i) is valid when IU_Receiving is not asserted and 
IU_Strobe_Out(i) is asserted. For these conditions, IU_Unexpected_Message(i) must not be asserted 
unless an error has been detected. 
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Figure A. 12: Generation of the unexpected-message syndrome 


A.5.2. IU_Empty_Buffer 

For a particular input channel, an empty-buffer error occurs when it is time for the IU to output a 
message and the input buffer is empty. This type of error can only happen during synchronous reception 
or fixed-delay reception with a predetermined reception window size. When receiving in asynchronous- 
monitoring mode, this error cannot happen because messages are always output one tick after they are 
received. Two cases need to be considered based on the number of expected messages during a reception 
window: single-message reception and multi-message reception. The value of syndrome 

IU_Empty_Buffer(i) is valid when IU_Receiving is asserted and either IU_Strobe_Out(i) is asserted for 
asynchronous-monitoring reception or IU_Ready is asserted for the other reception modes. 

Figure A. 13 shows an example for the case of a reception window with only one expected message. 
Here, IU_Empty_Buffer(i) is asserted at the end of the reception window because no messages were 
received from channel i during the window. Note that IU_Empty_Buffer(i) is asserted at the time to 
output the expected message as indicated by the IU_Ready signal. There is no required specific default 
value for the elements of the IU_Message_Out signal. This example applies to the reception of single- 
message synchronous streams, multi-message synchronous streams with separate reception windows, and 
fixed-delay reception with reception windows of predetermined sizes. 

Figure A. 14 shows two examples for the case of reception windows with multiple expected messages. 
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These examples apply only to synchronous reception. Two streams are expected, each one with three 
messages. The expected-reception interval for each individual message has a duration of five ticks (i.e., 
W RC v = 5), and the streams have a nominal DII of three ticks. For the first reception window, the second 
message from input channel i is not received, and thus the IU outputs an arbitrary value and asserts the 
empty-buffer syndrome. A late reception of the second message would have also resulted in the same 
error, although the final output message would have been B instead of C. In addition, if C had been 
received before the end of the second expected-reception interval, it would have been output at the time 
for message B as the second received message. For the second window, since the third message is 
missing, the IU asserts the empty-buffer syndrome with the last IU_Ready. 
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Figure A.13: Generation of the empty-buffer syndrome for a single-message reception window 
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Figure A. 14: Generation of the empty -buffer syndrome for multi-message reception windows 
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A.5.3. IU_Input_Overrun 

This error check is meant to detect when more messages than expected are received during a reception 
window. For a particular input channel, an input-overrun error occurs if messages remain in the input 
buffer at the end of a reception window when it is time to process the last expected message. This error is 
reported only at the end of a window. Similarly to the empty-buffer error, this type of error can only 
happen during synchronous reception or fixed-delay reception with a predetermined reception window 
size. When receiving in asynchronous-monitoring mode, this error cannot happen because there is no 
predetermined number of expected messages and all the received messages are output one tick after they 
are received. The value of syndrome IU_Input_Overrun(i) is valid when IU_Receiving is asserted and 
either IU_Strobe_Out(i) is asserted for asynchronous-monitoring reception or IU_Ready is asserted for 
the other reception modes. 

Figure A. 15 shows an example for the case of a reception window with only one expected message. 
IU_Input_Overrun(i) is asserted at the end of the reception window because more than one message was 
received from channel i during the window. This example applies to the reception of a single -message 
synchronous stream, a multi-message synchronous stream with separate reception windows, and fixed- 
delay reception with a reception window of predetermined size. 

Figure A. 16 shows a example for the case of a reception window with multiple expected messages. A 
three-message stream is expected. IU_Input_Overrun(i) is asserted at the end of the reception window 
because some received messages will remain in the input buffer. 
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Figure A. 15: Generation of the input-overrun syndrome for a single-message reception window 
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Figure A. 16: Generation of the input-overrun syndrome for multi-message reception windows 


A.5.4. IU_Link_Error and IU_Imm_Link_Error 

The link-error syndromes are based on communication checks performed by the Communication 
Module (CM) receivers. Each receiver generates a vector of error syndromes the size of which is 
specified by the structural parameter Num_Link_Synd. Each link receiver asserts its corresponding 
CM_Strobe_In signal whenever it has a new message. In addition, CM_Strobe_In may also be asserted 
when a link error is detected. The IU is designed based on a simple transaction model by which the 
assertion of a CM_Strobe_In input indicates that a new message and error syndrome set is available from 
the corresponding link receiver. The IU is required to read the syndromes and appropriately forward 
them to its outputs. 

IU signals IU_Link_Error and IU_Imm_Link_Error are generated based on the error syndromes from 
the link receivers. Whenever a new transaction is received on input channel i, IU_Strobe_Out(i) is 
asserted one tick later and the syndromes are output on signal IU_Imm_Link_Error(i). The timing of 
IU_Imm_Fink_Error is independent of reception or observation windows. On the other hand, the 
generation of signal IU_Link_Error is tied to these windows. For asynchronous-monitoring reception, the 
messages and syndromes for a particular channel are presented at the IU_Message_Out(i) and 
IU_Link_Error(i) outputs one tick after reception. For synchronous reception and for fixed-delay 
reception with a predetermined reception window size, the syndromes are output on the IU_Link_Error(i) 
signal at the same time the corresponding message content is output. Signal IU_Link_Error(i) is not 
asserted for receptions outside of reception or observation windows. The value of syndrome 
IU_Link_Error(i) is considered valid when IU_Receiving is asserted and either IU_Strobe_Out(i) is 
asserted for asynchronous-monitoring reception or IU_Ready is asserted for the other reception modes 

Figure A. 17 shows an example of link-error syndrome generation for asynchronous-monitoring 
reception. The syndrome bits for input channel i are represented by characters X, Y, and Z. Note that 
IU_Fink_Error(i) is arbitrary for messages received outside of the observation window. 

Figure A. 1 8 shows an example for synchronous reception of a three -message stream. Note that the 
timing to output the link syndromes on the IU_Fink_Error(i) signal matches the timing to output 
CM_Message_In(i) on signal IU_Message_out(i). For this particular example in which there is an input- 
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overrun error, the content and syndromes for the extra message do not reach the IU_Message_Out(i) and 
IU_Link_Error(i) outputs, respectively. 
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Figure A. 17: Generation of the link-error syndromes for an observation window with asynchronous-monitoring 

reception 
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A.5.5. IU_Buffer_Overload and IU_Imm_Buffer_Overload 

The purpose of the buffer-overload error check is to detect when the reception rate for a multi-message 
synchronous stream with a continuous reception window is higher than expected. It is possible to 
compute a maximum expected input-buffer load (i.e., the number of stored messages) for the reception of 
a stream using known bounds on the drift rates of the physical clocks, the nominal DII of the stream, and 
the delay in processing the first message of the stream at the receiving node. A buffer-overload error 
occurs on an input channel when a message is received such that the buffer load will exceed the expected 
maximum. The error is assigned to the message that triggers the overload, not the next message to be 
output by the IU. 

A buffer-overload error cannot occur during an observation window with asynchronous-monitoring 
reception because, irrespective of the actual message reception rate, the buffer never holds one than one 
message, since all the received messages are output one tick after they are received. For synchronous 
reception and fixed-delay reception with a predetermined window size, signal 

IU_Imm_Buffer_Overload(i) for input channel i is asserted one tick after an overload-triggering message 
is received. The value of signal IU_I m m_B uffc r_Over I oad( i j is valid whenever IU_Receiving and 
IU_Strobe_Out(i) are asserted, and it is arbitrary for all other conditions. The timing to output the 
syndrome on signal IU_Buffer_Overload(i) output matches the timing to output the received message 
content on signal IU_Message_Out(i). Thus, if an overload is detected, IU_Buffer_Overload(i) is 
asserted when the offending received message reaches the output. For reception windows with a single 
expected message, an overload occurs when more than one message is received, but 
IU_Buffer_Overload(i) is not asserted because the extra messages do not reach the output. This last type 
of error is also an input-overrun error, and thus the IU_Input_Overrun(i) signal will be asserted at the time 
to output the expected message. The value of signal IU_Input_Overrun(i) is valid when IU_receiving and 
IU_Ready are asserted, and it is arbitrary for all other conditions. 

The structural parameter Input_FIFO_Depth specifies the minimum size for each of the input buffers. 
The actual buffer size is determined by the IU design and implementation. During a reception window, 
the messages stored in an input buffer must not be overwritten. If an input buffer becomes full during a 
reception window, the required IU response is to not load (i.e., to dump) any additional messages received 
on the corresponding input channel until there is space available in the buffer. The only exception is 
when there is a simultaneous reading of the buffer; in that case, the required response is to load the 
received message. Dumped messages are not reflected in the IU_Message_Out, IU_Link_Error and 
IU_Buffer_Overload signals. The reception of the dumped messages is always reflected in the 
corresponding fields of the signals IU_Strobe_Out, IU_Imm_Link_Error and IU_Imm_Buffer_Overload 
since there is no memory storage restriction associated with these signals. Note that a buffer overflow 
condition can occur for every reception mode except when asynchronous-monitoring is active. 

Figure A. 19 shows as example of the generation of input-overload syndromes for the reception of two 
synchronous-message streams. Three messages are expect during each window, and the maximum 
expected load for the input buffer is two for both cases. For the first window, three messages are received 
within the expected-reception interval for the first message. Message C triggers the buffer overload. This 
is reported one tick later on signal IU_Imm_Buffer_Overload(i) and at the output time of the third 
message on signal IU_Buffer_Overload(i). For the second window, four messages are received and 
message D triggers the overload. This is reported one tick later on signal IU_Imm_Buffer_Overload(i), 
but it is never reported on signal IU_Buffer_Overload(i) because the D message does not reach the 
output. Nevertheless, IU_Buffer_Overrun(i) is asserted in this case. 
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Figure A. 19: Generation of the input-overload syndromes for synchronous reception 


A.5.6. IU_Frame_Sync_Error 

The errors detected during the execution of the Frame Synchronization process are reported when the 
process completes. Figure A.20 illustrates this. The set of syndrome bits is represented by character Z. 
For this example, the gap timer is set to expire after six ticks. The IU_Frame_Sync_Error syndromes are 
valid only when the Frame Synchronization process is complete, which is signaled by the third assertion 
of IU_Ready in the PD_Sync_Cap minor mode. The value of IU_Frame_Sync_Error is arbitrary at all 
other times. 
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Figure A. 20: Generation of the Frame Synchronization syndromes 


A.6. Response to MCU commands 

The IU reacts to nine different MCU commands. A node-reset command, indicated by asserting the 
MCU_Node_Reset signal, requires the IU to immediately return to a default state independently of the 
current activity or the value of other MCU command signals. This command may be issued at any time 
and the IU must respond to it. The other commands relevant to the behavior of the IU are indicated by the 
command signal MCU_Minor _Mode. The MCU will issue these commands only when the IU is idle and 
ready to receive a command. The IU design is required to execute each MCU command independently, 
and no assumption should be made about the sequence in which the commands are issued. 

Note that the IU is required to begin every reception or observation window with empty input buffers. 


A.6.1. Node_Reset 

The IU is required to return to its default state at most two ticks after the MCU_Node_Reset signal is 
asserted. The IU response includes setting its output signals to their default values as stated in Table A.2 
and invalidating the current PE communication schedule. In its default state, the IU is idle and waiting 
for a new command from the MCU. 

Figure A.21 shows an example of a node -reset command. Note that IU_Strobe_Out must be returned 
to its default value and held there while MCU_Node_Reset is asserted, and it is allowed to operate 
normally afterwards. 
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Figure A.21: Response to MCU command: MCU_Node_Reset 


A.6.2. Minor_Mode = Reset 

The response of the IU to a Reset minor mode command is similar to the response to a node -reset 
command, except that the IU must be in its default state one tick after the command is issued. Figure 
A.22 shows an example of the required response. The “Reset” value of signal MCU_Minor_Mode is 
represented here by the letter X. 
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Figure A.22: Response to MCU command: Minor_Mode = Reset 


A.6.3. Minor_Mode = Colleetive_Diagnosis 

For Collective Diagnosis the IU expects to receive messages during four separate synchronous- 
reception windows. For each reception window, only one message is expected from each opposite -kind 
node. The size of each reception window is given by parameter Syncns_Wnd_Sz. The delay from one 
tick after MCU_Ready is asserted to the beginning of the first window is given by CD_Wndl_Dly. The 
time interval between windows is given by CD_Wnd2_Dly. Figure A.23 illustrates the IU response for 
CD_Wndl_Dly = 3, CD_Wnd2_Dly = 2, and Syncns_Wnd_Sz = 4. The letter “X” on the 
MCU_Minor_Mode waveform represents the “Collective_Diagnosis” value. The label RxWnd stands for 
“Reception Window”. During the execution of this minor mode, the values of signals IU_PE_or_Acc, 
IU_Source_Id, and IU_Last_Message are arbitrary. 
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Figure A.23: Response to MCU command: Minor_Mode = Collective_Diagnosis 


A.6.4. Minor_Mode = Schedule_Update 

For Schedule Update the IU expects to receive two synchronous-message streams from each opposite- 
kind node. The number of messages in each stream is equal to the number of BIUs, which is denoted by 
N and given by structural parameter Num_BIU. Each stream is received with separate reception windows 
for each message or with a continuous reception window for the full stream. Parameter SU_Wnd2_Dly 
specifies the duration of the time interval between expected-reception intervals for each message of a 
stream. If SU_Wnd2_Dly = 0, then each stream is received with a continuous reception window. 
Parameter SU_Pl_Wndl_Dly(MCU_Node_Kind), which is a function of the local node kind, specifies 
the delay to the expected-reception interval for the first message of the first stream. Parameter 
SU_P2_Wndl_Dly(MCU_Node_Kind) specifies the duration of the time interval between the last 
window for the first stream and the first window for the second stream. SU_DII specifies the DII for both 
streams. Syncns_Wnd_Sz specifies the duration of the expected-reception interval for each message. 
Parameter SU_Max_Buff_Cnt specifies the maximum number of messages that an input buffer is 
expected to hold during a reception window, separate or continuous, under normal operation. 

In addition to receiving the message streams, during the execution of this command the IU receives 
the new schedule from the Route-and-Vote Unit (RVU) in the form of N consecutive messages 
corresponding to the processing results for the messages of the second stream. No particular time delay 
should be assumed between IU_Ready and RVU_Ready. After receiving the RVU results, the IU must 
assess the new schedule and update the schedule assessment outputs. Parameter Max_Num_PE_Msg 
specifies the maximum number of PE messages that can be scheduled. If the new schedule is invalid, a 
default schedule is used in which each PE is allocated Dflt_Num_PE_Msg messages. 

Figures A.24 illustrates the IU response for a system with 3 BIUs and the following reception 
parameter values: SU_Pl_Wndl_Dly(MCU_Node_Kind) = 1, SU_P2_Wndl_Dly(MCU_Node_Kind) = 
2, Syncns_Wnd_Sz = 3, SU_Wnd2_Dly = 1, and SU_DII = 4. The letter “X” on the MCUJVIinorJVlode 
signal waveform represents the “Schedule_Update” value of that signal. Note that IU_Last_Message is 
active in the execution. However, the values of signals IU_PE_or_Acc and IU_Source_Id are arbitrary. 
Since separate reception windows are used for each message, SU_Max_Buff_Cnt = 1 in each case. The 
new schedule is represented by values D, E, and F on signal RVU_Transform_Result. The worst-case 
delay to generate the schedule assessment outputs measured with respect to the last assertion of 
RVU_Ready is determined by the IU design. The values of IU_Zero_Schedule and IU_Invalid_Schedule 
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must be held constant until the Schedule_Update command is issued again. 
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Figure A. 24: Response to MCU command: Minor_Mode = Schedule_Update (separate reception windows) 


Figures A.25 illustrates the response for continuous reception windows with SU_Wnd2_Dly = 0, and 
SU_D11 = 2. The other parameters have the same values as in Figure A.24. 
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Figure A. 25: Response to MCU command: Minor_Mode = Schedule_Update (continuous reception windows) 


A.6.5. Minor_Mode = PE_Communication 

Irrespective of the schedule assessment results, the MCU will always issue a command to transition to 
the PE Communication mode. The response of the IU depends on the value of signal 
MCU_Schedule_Status. If MCU_Schedule_Status = Zero, the IU ignores the command and waits for the 
next one. If MCU_Schedule_Status = Valid, the latest loaded schedule is executed. If 
MCU_Schedule_Status = Invalid, the default schedule is executed. 

For MCU_Schedule_Status ■£ Zero, the IU expects to receive from each opposite -kind source a 
synchronous-message stream composed of the scheduled PE messages followed by a message for the 
Accusation Exchange protocol. Parameter PE_Wnd2_Dly specifies the duration of the time interval 
between the expected-reception intervals for each message. If PE_Wnd2_Dly = 0, then the stream is 
received with a continuous reception window. Parameter PE_Wndl_Dly(MCU_Node_Kind) specifies 
the delay from one tick after the command is issued until the time at which the expected-reception 
interval for the first message begins. Syncns_Wnd_Sz specifies the duration of the expected-reception 
interval for each message. PE_DII specifies the data-introduction interval, which applies to all the 
messages in the stream, including the Accusation Exchange protocol message at the trailing end. 
Parameter PE_Max_Buff_Cnt specifies the maximum expected number of messages stored in each input 
buffer during the reception windows. 

Figure A.26 illustrates the IU response for a system with 4 BIUs and schedule {(1, 2), (3, 1),(4, 2)}, 
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where for each (a,b) pair “a” specifies the PE source identification number and “b” specifies the number 
of scheduled messages. The parameter values are as follows: PE_Wndl_Dly(MCU_Node_Kind) = 1, 
PE_Wnd2_Dly = 1. Syncns_Wnd_Sz = 3, and PE_DII = 4. Since the messages are received with separate 
reception windows, PE_Max_Buff_Cnt = 1 for each window. 
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Figure A. 26: Response to MCU command: Minor_Mode = PE_Communication (separate reception windows) 

Figure A.27 illustrates the IU response with a continuous reception window. For this case, 
PE_Wnd2_Dly = 0, PE_DII = 2, and PE_Max_Buff_Cnt has a value larger than 1. 
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Figure A.27: Response to MCU command: Minor_Mode = PE_Communication (continuous reception window) 
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A.6.6. Minor_Mode = Sync_Preservation 

For this mode the IU expects to receive INIT and ECHO synchronization messages from each 
opposite -kind source in two separate reception windows. Fixed-delay reception is used in each case. 
Each window is characterized by delay and size parameters. Figure A.28 illustrates the IU response. For 
the INIT window, parameter SP_INIT_Wnd_Dly(MCU_Node_Kind) specifies the delay to the expected- 
reception interval with respect to the time one tick after the MCU command is issued, and parameter 
SP_INIT_Wnd_Sz(MCU_Node_Kind) specifies the duration of the reception interval. For the ECHO 
window, parameter SP_ECHO_Wnd_Dly(MCU_Node_Kind) specifies the delay to the expected- 
reception interval with respect to the time one tick after the RVU_Accept_INIT is asserted, and parameter 
SP_ECHO_Wnd_Sz(MCU_Node_Kind) specifies the duration of the reception interval. Note that the 
RVU_Accept_INIT is monitored only after the IU_Ready signal is asserted. 
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Figure A.28: Response to MCU command: Minor_Mode = Sync_Preservation 


A.6.7. Minor_Mode = SelfJTest 

This version of the RPP does not require the implementation of a self-test. When the MCU issues this 
command, the IU simply ignores it and waits for the next command. 


A.6.8. Minor_Mode = PD_Sync_Capture 

This command corresponds to the Focal Diagnosis Acquisition and Synchronization Acquisition 
modes executed consecutively. Figure A.29 illustrates the IU response for this command. The IU 
supports two activity threads in this mode. The first activity thread consists of four steps: measuring two 
consecutive time intervals, executing the Frame Synchronization protocol, and then enabling fixed-delay 
reception for ECHO messages. This first thread defines a continuous observation window composed of 
four intervals corresponding to the four steps. The second activity thread runs in parallel with the first 
one and consists of performing asynchronous-monitoring reception for each input channel during the full 
duration of the observation window defined by the first thread. 

Starting one tick after the MCU command is issued, the IU measures two consecutive observation 
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intervals, the first one of duration PD_Rdyl_Dly and the second one of duration PD_Rdy2_Dly. 
IU_Ready signals the completion of the intervals. The Frame Synchronization (FS) process is enabled 
upon completion of the second interval. When the FS process is complete, the Synchronization Capture 
interval begins, during which fixed-delay reception for ECFIO messages is performed. The observation 
window is closed when the RVU_Accept_ECHO signal is asserted. 
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Figure A. 29: Response to MCU command: Minor_Mode = PD_Sync_Capture 


A.6.9. Minor_Mode = ID_Initial_Sync 

This command triggers the execution of Initial Diagnosis followed by Initial Synchronization. Figure 
A.30 illustrates the IU response. For Initial Diagnosis, the IU performs synchronous reception with one 
expected message from each opposite-kind source. Parameter ID_Wnd_Dly specifies the delay from one 
tick after the MCU command is issued until the beginning of the expected-reception interval. Parameter 
ID_Wnd_Sz specifies the duration of the reception window. After the closing of the Initial Diagnosis 
window, the IU waits for IS_Wnd_Dly to enable fixed-delay reception for the INIT and ECHO messages 
of the Initial Synchronization protocol. Simultaneously, the IU performs asynchronous-monitoring 
reception. The observation window closes when RVU_Accept_ECHO is asserted. 
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Figure A. 30: Response to MCU command: Minor_Mode = ID_Initial_Sync 


A.7. Required parameters 

This section lists the required structural and behavioral parameters for the Input Unit. All the 
parameters are of type Natural. The values of these parameters are set prior to simulation and synthesis, 
and they remain fixed afterwards. 

Additional parameters may be identified during the IU design process. 


A.7.1. Structural parameters 


Table A. 3: Required structural parameters 


Description 

Parameter Label 

Allowed Value 
Range 

Typical Value 

Number of BIUs(=N) 

Num_BIU 

> 1 

3 

Number of opposite-kind nodes (= £2) 

Num_OK 

> 1 

3 

Number of syndromes from each link receiver (= L ls ) 

Num_Link_Synd 

> 1 

1 

Minimum size of the input buffers 

Input_FIFO_Depth 

> 1 

9 


A.7.2. Behavioral parameters 

Table A.4 lists the IU behavioral parameters. Some parameters are a function of the node kind 
specified by the MCU_Node_Kind signal. All the behavioral parameters have units of local clock ticks. 
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Table A.4: Required behavioral parameters 


Description 

Parameter Label 

Allowed 
Value Range 

Typical 

Value 

First IU_Ready delay for Local 
Diagnosis Acquisition 

PD_Rdyl_Dly 

> 1 

5001 

Second IU_Ready delay for Local 
Diagnosis Acquisition 

PD_Rdy2_Dly 

> 1 

5001 

Window delay for Initial Diagnosis 

ID_Wnd_Dly 

>0 

0 

Window size for Initial Diagnosis 

ID_Wnd_Sz 

> 1 

50000 

Window delay for Initial 
Synchronization 

IS_Wnd_Dly 

> 1 

3 

INIT window delay for Synchronization 
Preservation 

SP_INIT_Wnd_Dly(Node_Kind) 

>0 

RMU: 3 
BIU: 20 

INIT window size for Synchronization 
Preservation 

SP_INIT_Wnd_Sz(Node_Kind) 

> 1 

RMU: 7 
BIU: 9 

ECHO window delay for 
Synchronization Preservation 

SP_ECHO_Wnd_Dly(Node_Kind) 

>0 

RMU: 22 
BIU: 23 

ECHO window size for Synchronization 
Preservation 

SP_ECHO_Wnd_Sz(Node_Kind) 

> 1 

RMU: 9 
BIU: 7 

Delay for INIT pulses 

INIT_Pls_Dly(Node_Kind) 

> 1 

RMU: 8 
BIU: 10 

Width parameter for INIT pulses 

INIT_Skw(Node_Kind) 

> 1 

RMU: 3 
BIU: 4 

Delay for ECHO pulses 

ECHO_Pls_Dly(Node_Kind) 

> 1 

RMU: 10 
BIU: 8 

Width parameter for ECHO pulses 

ECHO_Skw(Node_Kind) 

> 1 

RMU: 3 
BIU: 3 

First window delay for Collective 
Diagnosis 

CD_Wndl_Dly 

>0 

4 

Delay between the windows for 
Collective Diagnosis 

CD_Wnd2_Dly 

> 1 

5 

First window delay for the first message 
stream in Schedule Update 

S U_P 1 _W nd l_Dly(Node_Kind) 

>0 

RMU: 3 
BIU: 15 

First window delay for the second 
message stream in Schedule Update 

S U_P2_W nd 1 _Dly (N ode_Kind) 

> 1 

RMU: 15 
BIU: 16 

Delay between expected-reception 
intervals for the first and second 
message streams in Schedule Update 

SU_Wnd2_Dly 

>0 

0 

Data introduction interval for the first 
and second message streams in 
Schedule Update 

SU_DII 

> 1 

2 

Maximum expected input buffer load 
for the first and second message stream 
in Schedule Update 

S U_Max_B uff_Cnt 

> 1 

4 

Maximum number of PE messages that 
can be processed during PE 
Communication minor mode 

Max_Num_PE_Msg 

> 1 

4645 

Number of PE messages per PE for the 
default communication schedule 

Dflt_N um_PE_Msg 

> 1 

50 

First window delay for the message 
stream in PE Communication 

PE_Wndl_Dly(Node_Kind) 

>0 

RMU: 4 
BIU: 16 


230 




Description 

Parameter Label 

Allowed 
Value Range 

Typical 

Value 

Delay between expected-reception 
intervals for the message stream in PE 
Communication 

PE_Wnd2_Dly 

>0 

0 

Data introduction interval for the 
message stream in PE Communication 

PE_DII 

> 1 

2 

Maximum expected input buffer load 
for the message stream in PE 
Communication 

PE_Max_B uff_Cnt 

> 1 

4 

Size of expected-reception interval for 
synchronous messages (Does not apply 
to Initial Diagnosis messages) 

Syncns_Wnd_Sz 

> 1 

7 

Gap timer timeout for Frame 
Synchronization 

Frm_Sync_Gap(Node_Kind) 

> 1 

RMU: 3 
BIU: 3 


A.8. Additional remarks 

This section presents various miscellaneous requirements and remarks not presented elsewhere in the 
document. 


A.8.1. Required registered outputs 

The IU is the first processing stage of the RPP for received messages. Its outputs are read directly by 
the MCU, IDU, and NDU units. To prevent excessive CLK-to-IU-output latency, which could have a 
strong negative impact on the overall processing performance of the RPP, the following IU outputs are 
required to be registered (i.e., to be the direct output of a flip-flop, register, or memory element): 


IU_Message_Out 

• IU_Unexpected_Message 

IU_Strobe_Out 

• IU_Link_Error 

IU_Receiving 

• IU_Imm_Link_Error 

IU_Ready 

• IU_Empty_Buffer 

IU_Source_Id 

• IU_Buffer_Overload 

IU_PE_or_ACC 

• IU_Imm_Buffer_Overload 

IU_Last_Message 

• IU_Input_Overrun 
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A.8.2. Relevant parameter relations 

The following relations are satisfied for the current design of the RPP. 

• INIT_Pls_Dly(Node_Kind) > SP_INIT_Wnd_Sz(Node_Kind) (A. 15) 

• ECHO_Pls_Dly(Node_Kind) > SP_ECHO_Wnd_Sz(Node_Kind) (A. 16) 

• INIT_Skw(Node_Kind) < SP_INIT_Wnd_Sz(Node_Kind) (A. 17) 

• ECHO_Skw(Node_Kind) < SP_ECHO_Wnd_Sz(Node_Kind) (A. 18) 

• Frm_Sync_Gap(Node_Kind) = ECHO_Skw(Node_Kind) (A. 19) 

• SU_DII > 2 (A. 20) 

• PE_DII > 2 (A.21) 

• SU_Max_Buff_Cnt = 1 for SU_Wnd2_Dly > 1 (A.22) 

• PE_Max_Buff_Cnt = 1 for PE_Wnd2_Dly > 1 (A.23) 

• SU_Max_Buff_Cnt < Syncns_Wnd_Sz (A. 24) 

• PE_Max_Buff_Cnt < Syncns_Wnd_Sz (A.25) 

• Input_FIFO_Depth > max(SU_Max_Buff_Cnt, PE_Max_Buff_Cnt) + 1 (A.26) 
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